Python Speech Recognition Tutorial – Full Course for Beginners

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

This course will teach you how to implement speech recognition in Python by building five projects. And it's easier than you may think. This course is taught by these two amazing instructors. Hi, everyone. I'm Patrick. And I'm Alyssa. Patrick is an experienced software engineer. And Ministra is an experienced data scientist. And they're both developer advocates at assembly AI. Assembly AI is a deep learning company that creates a speech to text API, and you'll learn how to use the API. In this course, assembly AI provided a grant that made this course possible. They also have a YouTube channel where they post weekly Python and machine learning tutorials. So here's the projects you'll learn to build in this course. So on the first project, we are going to learn how to deal with audio data, we are going to see how to capture audio information from a microphone and save it as a WAV file. In the second project, we are going to learn how to do speech recognition. On top of this audio file that we have just recorded using assembly API's API. On the third project, we are going to change gears a little bit and start doing sentiment analysis on iPhone reviews that you find on YouTube. On the fourth project, we are going to summarize podcasts that we find online and build a web app to show the results to users. And on the last project we are going to use speech recognition in combination with open AI API to make an app that can answer users questions. I hope you're excited. Let's get started. Alright, so in this first part, I teach you some audio processing basics and Python. So we briefly touched on different audio file formats. Then we talk about different audio signal parameters that you should know. Then I show you how to use the wav module to load and save a WAV file, then I show you how you can plot a wave signal. Then I also show you how to do a microphone recording in Python. And finally, I also show you how to load other file formats like mp3 files. So let's get started. So first of all, before we write some code, let's talk briefly about different audio file formats. So here I've listed three of the most popular ones, mp3, FLAC, and WAV. mp3 is probably the most popular one that you may know. And this is a lossy compression format. So this means it compresses the data. And during this process, we can lose information. On the other hand, flax is a loss less compression format. So it also compresses the data. But it allows us to perfectly reconstruct the original data. And WAV is a uncompressed format. So this means it stores the data in an uncompressed way. So the audio quality here is the best but also the file size is the largest. And WAV is the standard for CD audio quality. So we focus on this in the first part, because it's actually very easy to work with this in Python, because we have a built in wav module, so we don't have to answer this. And now let's have a look at how we can work with a WAV audio file. By the way, WAV stands for wave form audio format. And before we start loading some data, let's talk about a few parameters that we have to understand. So before we load our first wav file, let's understand a few parameters. So we have the number of channels, this is usually one or two. So one is also known as mono and two is stereo. So this is the number of the independent audio channels, for example to or stereo has two independent channels. And this means it gives you the impression that the audio is coming from two different directions. Then we have the sample with this is the number of bytes for each sample. So this will get more clear later when we have a look at an example. And then we have the frame rate, which is also known as the sample rate or sample frequency. And this is a very important parameter. So this means the number of samples for each second. And for example, you may have seen this number a lot. So this means 44,100 hertz or 44.1 kilohertz. This is usually the standard sampling rate for CD quality. So this means we get 44,100 sample values in each second. And then we have the number of frames. So yeah, this is the total number of frames we get. And then we have the values in each frame. And when we load this, this will be in a binary format, but we can convert this to integer values later. So now let's have a look at how to load a file. So With the wave four wave module. So here I prepared a simple wav file. So this is five seconds long. So let's actually listen to this. Hi, my name is Patrick, and I'm a developer advocate at assembly AI. And yeah, here, we also see a few parameters are ready. So now let's go back to the code. And now let's load this file. So for this, we create an object and we simply say wave dot open, then we have to give it the name. So this is called Petrick, dot wav. And to read this, we say we read this in read binary. And now we can extract all these different parameters. For example, let's print the, let's say the number of channels. And we get this by saying object dot get N channels. Then we also want to print the sample with so print the sample with and we get this biasing object dot get Sam with, then let's print the frame rates. So print frame rates, and we get this by saying object dot get frame rate. Then what do we also want, we also want the number of frames. So we print the number of frames, and then we say object dot get, and not the N channels and frames. And lastly, let's also print the, all the parameters. So we can get all the parameters at once by saying object dot get per params. And now let's print this. So if we run this, so I say python wave example.pi, then we see we have only one channel. So this is a mono format, we have a sample width of two, so we have two bytes for each sample, then we have a frame rate of 16,000, and a number of frames of 80,000. And here we also have all the parameters as a WAV params object. So for example, now we can calculate the time of the audio. And we as I set the frame rate is the number of samples per second. So if we get the whole number of frames, so the number of frames or number of samples, divided by the frame rate, then we get the time and seconds. So now if we print t audio, and run this, then we get 5.0. So five seconds, so this is the same that we see here. So this works. And now let's get the actual frames. So the frames equals object dot get frames gets no sorry, object dot read frames, and then we can give it the number of frames, or, or we can I think we can pass in minus one. So this will read all frames. And let's for example, so let's print the type of this to see what this is. And then also print the type of frames cero. And then let's print the length of frames. So now let's run this. And then we see this is a bytes object. And so here we see class bytes. And when we extract the first byte, then we see this as a integer. And now the length of the frames object is 160,000. So this is not the same as the number of frames. So if we have a look here, the number of frames is 80,000. But if we extract the length here, then this is twice as much. And if you listen carefully in the beginning here, I mentioned the sample with this means we have two bytes per sample. So now if we actually cut collate this divided by two, then again, we get our 80,000 number of frames. And yeah, this is how easily we can read a WAV file. And then we can work with this and work with the frames. And now to load or to save the data again, we also open let's call this object, new equals, and then we say wave dot open, then we can give it a new name, let's say Patrick underscore new dot wave. And now we open this in write binary mode. And now we can basically call all those functions as setters and not as getters. So we say object, new dot sets, number of channels, so this is only one channel, then we say object new dot set sample with this should be to object new dot set frame rates, this is 16,000 as a float, so these are all the parameters we should set. And then we can write the frames by saying object, new dot write frames, and then the frames. So here we have the original frames. And now basically, we duplicate the file. So we write the frames and what I forgot. So when we are done with opening this and reading all the information we want, we all should also should call objects, thoughts close. And then the same here. So here we say object, new dot, close. And this will close the file objects. And yeah, so now if we save this and run this, then now here we see we have the duplicated file. And if we run this, Hi, my name is Patrick, and I'm a developer advocate at assembly AI, then we see this works and the test the same data in it. So yeah, this is how to work with a WAV file and with the wav module. So now let's see how we can plot a WAV file object. Now plotting a wave signal is actually not too difficult. So for this, we just need to install matplotlib and NumPy. Then we import all the modules we need. So again, we need wave, then we need matplotlib.pi plot as PLT t, and then we import NumPy, num, pi, s and p. Then again, I want to read the wav file. So I say wav dot open. And this was Patrick dot wav in read binary mode, then I want to read all the parameters that I need. So I want to read the sample frequency. And this is objects dot get frame rates, then I need the number of samples. So this is object dot get and frames. And then I also need the actual signal. So I call this signal dot wave equals object dot read frames minus one. So all the frames and then I can say object dot close. And then for example, we can calculate the number on the the length of the signal in seconds. So I call this T audio. And if you remember this from the first code, so this is the number of samples divided by the sample frequency. And now let's print the T audio and save this and run this just as a test. So now we can run Python plot audio and we get 5.0. So this works so far. So now I want to create the plot. So this is a bytes object. So we can create a numpy array out of this very easily. So I call this signal array equals and then we can use NumPy from buffer and here we put in the signal, signal wave. And we can also specify a data type. So here I want to be to have this int C Steam. And now we need an object for the x axis or the so the times axis. So we say times equals, and here we use the numpy linspace function, this gets zero as the start. And the end is the length of the signal. So this is T audio or five seconds. And then we can also give this a number parameter and the number is the number of samples. So if you remember, so the signal wave. So here, we basically get a sample for each point in time. And now we want to plot this. So we create a figure. So we say PLT dot figure, and we give this a fixed size of 15 by five, then we say P L T dot plot, and we want to plot the times against the signal array, then we simply give it a title P L T dot title, and let's call this audio signal. Then I also want to say P L T dot y label, and the y label is the SIG no wave, and the P L T x label is the time time in seconds. And then we say P L T x Lim. And we limit this to be between zero and T audios for five seconds. And then we say P L T dot show. And this is all we need. And now if we run this, then this should open our plot. And here we have it. So here we have our audio signal plotted as a WAV plot. And this is how easily we can do it with matplotlib and the wave module. Now let's learn how we can record with our microphone and capture the microphone input in Python. So for this, we use PI audio, a popular Python library. And this provides bindings for port or your the cross platform audio i o library. So with this, we can easily play and record audio. And this works on Linux, Windows and Mac. And for each platform, there's a slightly different installation command that they recommend. So for example, on Windows, you should use this command. On Mac, you also have to install port or your first so if you use homebrew, then you can easily say brew install port audio and then pip install PI audio. And on Linux, you can use this command. So I already did this. So here I'm on a Mac. So I use brew, install port audio, and then pip install PI audio. And now I can import this so I say import pi audio. And I also want to import wav to save the recording later. Then I want to set up a few parameters. So I say frames per buffer. And here I say 3200. So you can play around with this a little bit. Then I specified the format. So the format equals pi or your dot P R int, sixth team. So this is basically the same that we use here. So here we use NumPy, n 16. And then here we have the P r and 16. Then I also specify the number of channels. So here I say one, so simply a mono format, and then also the frame rate. So the rate here again, I say 16,000. So again, you can use a different rate and play around with this. Then we create our PI audio object. So we say p equals pi audio. Then we create a stream object. So we say stream equals P dot open. And now we put in all the parameters. So we say format equals format, then I need the channels. So channels equals channels, the rate equals the rate. We also want to capture the input so input equals true. And lastly, we save frames per buffer equals frames per buffer. Then we have our stream object. So now we can didn't start recording. And now we want to record for a number of seconds. So here I say five seconds. And then we store the frames. And we store this in a list object. And now we can iterate over and say, for i in range, and we start at zero and go until, and now we say rate divided by frames per buffer times the seconds. And then we convert this to a integer, not a float. And with this, we basically record for five seconds. And then we read each chunk. So we say, data equals, and then here we say stream dot read. And then we read the frames per buffer. And then we say, frames, dots, append the data. So basically frames per buffer. So this means we read this many frames in at one, so with one iteration. And now we have it. So now we can close everything again. So we can say stream dot stop stream, then we can also say stream dot close. And we can say P dot terminate. So now we have everything correctly shut down. And now we can, for example, save the frames object again in a WAV file. So for this, I say, object equals wav dot open. And let's call this output dot wav wave, and in write binary mode, then we set all the parameters. So I said object set number of channels, this is the channels parameter objects dot set sample with this is the this we get from P dot get sample size of our format, then object dot set frame rate, this is the rate and then we can write all the frames. So we say object, dot write frames. And we need to write this in binaries. So we can create a binary string like this, so a string, and then dot join. And here we put in our frames list. So this will combine all the elements in our frames list into a binary string. And then we say object dot close, and this is everything we need to do. So now we can run Python, record Mike and test this. Hi, I'm Patrick. This is a test 123. And now it's done. So here we have our new file. So let's play this and see if this works. Hi, I'm Patrick. This is a test 123 And it worked awesome. And now As last step I also want to show you how to load mp3 files and not only wav files, so let's do this. So to load mp3 files, we need an additional third party library and I recommend to use pie it up so this is a very simple to use library it provides an simple and easy high level interface to load and also to manipulate audio. So in order to install this, we also need to install FFmpeg so on the Mac, I use homebrew so I had to say brew install FFmpeg and after this you can simply say pip install and then pipe it up. And now this should install it. So he has already set up sfide. And now we can for example say from pi up we want to import the audio segment. And then we can say audio equals audio segment and then here we can say from mp3 If we have an mp3, in my case right now I only have a from WAV and then here I am, let's load the Petrick dot WAV and then we can for example also very easily manipulate this by saying audio plus six audio plus six. So this will increase the volume by six d be 60 P then we can also for example Repeat the clips. So we say audio equals all your times to, then we can use a fade in, for example, audio equals audio dot fade underscore in with with 2000 milliseconds. So two seconds fade in the same works with fade out. So yeah, this is how we can manipulate. And then we can say audio dot export. And then I want to export this in. Let's call this mash up dot mp3. And then I have to say format equals as a string, mp3. And now for example, I could load this by saying, or you two equals or your dot from mp3. And then here I use mesh up dot mp3 and then print done so that we see it arrives at this part. And now let's say python, and then the load mp3 file. And yeah, this works. So now here we have our mp3 file. And we could also load it like this. So yeah, that's how you can use the pilot module to load other file formats as well. And that's all I wanted to show you. In this first part, I hope you learned a little bit about audio processing, and Python and charters. And now let's move on and let's learn how to do speech recognition and Python. Hey, and welcome. In this project, we are going to learn how to do speech recognition in Python, it's going to be very simple, what we're going to do is to take the audio file that we recorded in the previous projects, and turn it into a text file. Let me show you how the project works. So here is the audio file that we recorded in the previous project. Hi, I'm Patrick, this is a test 123. And if we run our script we get the text transcription of this audio file. So like this here. Hi, I'm Patrick. This is a test 123. So let's learn how to implement this in Python. So for this project, we are mainly going to need two things assembly API's API to do the speech recognition and the request library from Python to talk to assembly API's API. So let's first go ahead and get a API token from Assembly AI. It's very simple, you just need to go to assembly ai.com and create a free account. Once you have an account, you can sign in and just copy your API keys by clicking here. And right away, I'm going to create a configure file and put my API key here. Once I've done that, now I have a way of authenticating who I am with SMD as API. And now we can start setting up how to upload transcribe and get the transcription from Assembly as API. The next thing that I want to do is to have a main file that is going to have all my code. What I need to do in there is to import the requests library so that I can talk to the assembly API. So this project is going to have four steps. The first one is to upload the file that we have locally to assembly AI. Second one is to start the transcription. Third one is keep pulling assembly as API to see when the transcription is done. And lastly, we're going to see this transcript. So uploading is actually quite simple. If we go to the documentation of assembly AI, we will see here uploading local file files for transcription. So I can just copy and paste this and change the code as we need it. So basically, yeah, okay, we are importing the request library already. The file name we are going to get from the terminal. So I will set that later. Just a couple of things that we need to pay attention here. Basically, there is a way to read the audio file from our file system. And then we need to set up a Heather's these headers are used for authentication. So we can actually already set this because this is not going to be your API token. We set it to be API key assembly API, right. And we need to import it here of course. Alright, that's done. So we also have a upload endpoint for assembly and this one is API that assembly comm V to upload. But you know, this might be something that we need also later, so I'm just going to put this to a separate value variable and then just call this one here. You So when we're doing, when you're when you're uploading a file to assembly AI, we are doing a post request. And this post requests, you need to, you need to send this post request to the upload endpoint, you need to include your API key included in the headers. And of course, you need the data. So the file that you read, and we are reading the data through the read file function in chunks, because some of the AI requires it to be in chunks. And in chunk sizes of five megabytes. Basically, this is a number of bytes that are in there. While we're at it, we can already get the file name from a terminal tool, right. So for that, I just need to import system. And inside system, the second, or the first, not the zeroeth variable, or the argument is going to be the file name. And here, let's clean up a little bit. All right, now, we should be able to just run a command on the terminal, include the name of the file that we want to upload, and it will be uploaded to assembly AI. And we will also let's print the response that we get from Assembly AI to see what kind of response we get. Again, this is the file that we are working with. Hi, I'm Patrick, this is a test 123. And what we need to do right now is one Python main.py and the name of the file in this case, output dot love. All right, so we uploaded our file to assembly AI successfully. In the response, what we get is the upload URL, so where your data where your audio file lives right now. And using this, we can start the transcription. So for the transcription, let's again cheat by getting the code from the docs. Here is the data the code that we need, starting from here. So this is a transcription endpoint, you can see that it ends differently than the upload endpoint. This one ends with upload this one ends with transcript, I will call this the transcript endpoint. Heather's we already have a header, we don't really need this anymore. The endpoint is transcript endpoint. JSON is the data that we are sending to or the data that we want somebody AI to transcribe. So we are going to need to give it the order URL, we already have the order URL, right. So we got the response. But we did not extract it from the response. So let's do that. Oh, do URL is response, JSON. And it was cold, upload girl. So we're going to give us audio euro to hear because that was just an example. Okay. And this way, we will have started the transcription. And let's do this and see what the result is. I will run this again. Same thing. Alright, so we got a much longer response. In this response, what we have, we have a bunch of information about the transcription that we've just started. So you do not get the transcript itself immediately. Because depending on the length of your audio, it might take a minute or two, right? So what we get instead is the ID of this transcription job. So by using this ID, from now on, we can ask somebody AI Hey, here is the ID of my job, this transcription job that I submitted to you, is it ready or not. And if it's not ready, it will tell you it's not ready yet. It's still processing. If it's ready, it will tell you hey, it's completed. And here is your transcript. So that's why the next thing that we want to build is the polling we're going to keep we're going to write the code that will keep polling assembly AI to tell us if the transcription is ready or not. But before we go further, let me first clean up this code a little bit so that you know everything is nicely packed in functions, we can use them pre use them again if we need to. So this one is the upload function. Yes, and what it needs to return is the audio URL. We do not need to print the response anymore. We've already seen what it looks like. And we need to put the header separately because we want both upload and transcribe and basically everything else will be able to reach this variable called Heather's for transcription again, I will create a function called transcribe and what I need to return from the transcription function is the ID. So I will just say job ID and that would be response dot JSON and ad again don't need this anymore. I'll just call this transcript response to make it clear this will be upload response. Let's call this transcript request. So everything is nice and clean. This is this and this goes here. And for upload response, we use it here. And we need to return job ID. Alright, so now we have them nicely wrapped up in different functions, and everything else looks good. Let's run this again to see that it works. Now, of course, I'm not calling the function. So let me call the functions and then run it, upload and transcribe. But of course, I also need to pass the file name to the upload function. So let's do that too. Do your URL is not defined, or the URL of course, then I also need to pass audio URL, audio URL to transcribe Good thing we tried. So this will be returned from the upload function, and then we will pass it to the transcript function. And as a result, we will get job ID. And then I can print job ID to see that it worked. Let's see. Yeah, yes, I do get a job ID. So okay, things are working. The next thing that we want to do is to set up the fault polling function. So the first thing we need to do for that is to create a polling endpoint polling endpoint. So as you know, we had the transcript endpoint and the upload endpoint here. That's how we communicate with somebody as API with polling endpoint is going to be specific to the transcription job that you just submitted. So to create that, all you need to do is to combine transcript endpoint with a slash in between, and add the job ID but the job ID is a bit weak. So I'll just going to call this transcript ID. So by doing that, now, you have a URL that you can ask to assembly AI with which you can ask assembly AI if your job is done already or not. And again, we're going to send a request to assembly AI, this time is going to be a get request. Well, I'll just copy this so that it's easy. Instead of post is going to be a get request, we're going to use a polling endpoint instead of the transcript endpoint. And we just need the Heather's for this we do not because we are not sending any information to assembly AI. We're just asking for information. If you're familiar with requests, normally, this might be very simple for you. But all you need to know about this is that when you're sending data to an API, you use the post request type. And if you're only getting some information, as the name suggests, you use the get request type. So the results resulting or the response that we get is going to be called polling response. Let's see it's not job ID, I called transcript ID so that it works. Then we get the polling response. And I can also show you what the polling response looks like. Looks good. Okay, let's run this. Alright, so we got response 200. That means things are going well, but actually, what I need is a JSON response. So let's see that again. Yes, this is more like it. So again, we get the idea of the response language model that is being used and some other bunch of information. But what we need here is the status. So let's see where that is. Oh, yeah, there it is. So we have status processing. This means that the transcription is still being is still being prepared, so we need to wait a little bit more and we need to ask assembly AI again soon to see if the transcription is done or not. What we normally do is to wait 30 seconds or maybe 60 seconds depending on the length of your transcription with length of your audio file. And then when it's done, it will give us status completed. So let's write the bits where we ask assembly AI repetitively if the transcription is done or not. So for that, we can just create a very simple while loop while true. We do the polling and if poling response dot JSON status equals to complete it. We return the polling response. But if polling response, status is error, because it is possible that it might air out, then we will return. There. I'll just wrap this into a function, I can call this gets transcription results URL. And while we're at it, we might as well so wrap the polling into a function, do we need to pass anything to it? Yes, the transcript A D, need to pass a transcript ID to it. And instead of printing the response, we will just return the response. So instead of doing the request here, all we would need to do is to call this function with the transcript ID, we can pass the transcript ID here or might as well I will just call the transcription or transcribe function in here. And the resulting thing would be the transcript ID from the transcription function. And then I'm going to pass this transcript ID to the polling function that is going to return to me the polling response. I will call this polling response data. And inside this data, so this is not needed anymore. Yeah, so the polling response that JSON is what is being passed, I call that the data. So I change this to data here. And also data here. Yeah, then I'll just pass the data. If it's error, I can still pass the data just to see the response and what kind of what kind of error that we got and hear them and just just say none. All right, let's let's do a little cleanup. So we have a nice upload function, a transcript function, what we did before was we were calling the upload function, getting the audio URL and then passing it to transcribe, but I'm running transcribe here. So I do not need this anymore. I still need to pass the order, you're all to transcribe. So then I would need to pass it to here. So instead of this, just need to call this function with the audio y'all. Yeah, let's put these here. Actually, to make it a bit more understandable, maybe instead of passing the string error, I can just pass whatever error that was that happened in my transcription, then you know, we'll be able to see what went wrong. Alright, so what we get as a result from get transcription result ID is the data. And if there is any error, so then let's Why not run this and see what the data is going to look like. All right, so we get something really, really big. Let's see, maybe I'll just clear this and run it again, just so that you know, we can see it more clearly. Alright, so we get the ID, again, language model that is being used, etc. Now we want the results. Yes, it is under Text. Hi, I'm Patrick. This is a test 123 is what we get. And we also get the breakdown of words, when each word started meant each word ended in milliseconds confidence of this classification, and much more information. What we want to do though, even though we have all this information, we want to write this transcript that is generated by assembly AI into a text file. So in this next step, that's what we're going to do. Alright, let's come up with a file name for this file. We can call it actually we can just call it the same thing as a file name plus txt. So the file name okay, we were using the argument or variable file name too. So maybe let's find something else. We'll just call this txt file name. And it will be the file name plus dot txt. We can also just you know remove the dot valve or dot mp4 or whatever but let's not deal with that for now. So once I have this I will just open it I will open it in Writing format. And inside I will write data. Texts because that's where we have the text information on the transcript. If you remember here, this was a response we got and text includes the transcription. And I can just prompt the user saying that transcription is saved, transcriptions saved, are happy. Of course, there is a possibility that our transcription errors out. So you want to cover that too. If you remember, we returned data and error, what we can do is you can say if data is returned, this happens. But if it errored out, I will just print error. No, it didn't work out and the error itself so that we see you know what went wrong. Okay, let's do a little cleanup. Again, I want to wrap this all up in a function, we can call the Save transcript. Data and error will be returned from get transcripts URL, it means the audio URL, so I will just need to pass over your URL here. And with that, we're actually more or less ready. So let's run this and see if we get what we need the transcript saved in a file. For that after the after calling the upload function, I can move this one here and calling the upload function here called the upload function. And then I call the Save transcript function. And let's quickly follow that up. I call this a transcript function. It calls get transcription result URL, get transcription result, URL calls, transcribe, transcribe is here. It starts with transcription process and then get transcription result URL also calls polling. So it keeps pulling assembly AI. And when it's done, it returns something and then we deal with it in the Save transcript function, and we either save a transcript or if there is an error, we display the error. So let's run this and see if we get any errors. Transcription saved. Alright, let's see. Output was that txt. If I open it up, it looks quite small. Maybe I can, if I open it like this. Yes. Hi, I'm Patrick. This is a test 123 is the result that we're getting. So that's awesome, we actually achieved what we wanted to do. So in this next couple of minutes, I actually want to clean up the code once again, because you're going to build a couple more projects. And we want to have a python file that has some reusable code, so we don't have to reinvent the wheel all the time. So let me first go here, actually, when we're doing the polling, if we just have a while true loop, it's going to keep asking assembly AI for results. And you know that that might be unnecessary. So what we can do is to include some waiting times in between, so it can ask if it's not completed yet. It Can Wait, let's say 30 seconds to ask again. So we can inform the user waiting 30 seconds. What I need is a time module. So let's close 30. And I will just import time here. And this way, it should be waiting 30 seconds in between asking assembly AI if the transcript is ready or not. And okay, let's create that extra file that we have API communication, I'll call it. Yes. So I will move all of the functions that communicate with the API there. So I need to move the upload function. I need to move transcribe poll all of these actually. So just remember that Yeah. Let's see. Did we miss anything though, I'll just remove these from here. File Name can stay here, of course, headers and the upload and transcript endpoints need to live here because they are needed by the functions. In here, we have to import the requests library. So we don't need it anymore here. We need to import the assembly AI API key. system needs to stay here time needs to go there. And we also need to import from API communication. Import, we'll just say all. And that way we can use these functions in our main python script. I will run this again to make sure that it is still working. So I will delete the text file that was created, I will keep the output. Nice. So we also get the prompt that the program is waiting 30 seconds before asking again. Oh, yeah, we passed the filename. But of course, it might not exist there. So let's go and fix that the file name is here, we only pass it to the upload function and the upload function is here now. And in the Save transcript, we do not pass it, but we're actually using it. So what we can do is to just also pass the file name here. And that should be fine. It should fix the problem. Transcription saves. Alright, let's see output one. txt Hi. Oh, like this. Hi, I'm Patrick. This is a test 123. So this is a very short audio file. And we actually been using it over and over again. So I want to also show that this code is working to you using another audio file. This is the audio of the one of the latest short videos that I made for our YouTube channel. I was just talking about what natural language processing is. So this time, maybe if I add underscores, it will be easier to call. Yes, I will just copy its name. And when I'm calling the script, I will use its name. This will probably take a little bit longer because the audio file has been using is only a couple of seconds. And this one is one minute. So we will see what the results are going to show us. Right here we go the transcription is saved we find here. Right, this is exactly what I was talking about. Let's listen to it while the transcription is open. Kind of like severe, Best Funny. Well, not now. But probably very soon. We haven't seen gigantic leaps over the last couple of years in terms of how computers can understand and use natural language. Alright, you get the idea. So our code works. This is amazing. I hope you've been able to follow along. If you want to have the code. Don't forget that you can go get it and the GitHub repository we prepared for you using the link in the description. Welcome back to the third project. So in this one, I teach you how to apply sentiment analysis to YouTube videos. So you will learn how to use the YouTube DL package to automatically download YouTube videos are only extract the information you need. And then I also teach you how to apply sentiment analysis. So in this example, I use iPhone 30 review videos. And the result that we get looks like this. So for each sentence in the video, we get the text. And then we also get the sentiment. So this could be positive or negative or neutral. For example, if we read this text here, the new iPhone display is brighter than before the battery life is longer, and so on, and the sentiment is positive. And here the text is still there are some flaws, and now the sentiment is negative. So this works pretty well. And this can be applied to so many use cases. So let's get started and see how to do this. So here I've created a new project folder. And we already have our API secrets and the API py file with the helper functions to work with the assembly API API. And now let's create two more files. So the main.py file that will combine everything and the YouTube extractor files, this is another helper file to extract the infos from the YouTube video. And for this, we are going to use the YouTube DL package. This is a very popular command line program to download videos from YouTube and other sites. And we can use this as command line program. But we can also use this in Python. So for this we say pip install YouTube DL, and then we can import this so we say import YouTube DL. And then we set up an instance so we say why the L equals YouTube DL dot YouTube DL. And now I'm going to show you how you can download a video file and also how you can extract the infos from a video. So let's create a helper function that I call get video infos and this takes and URL And now we use the YT l object as a context manager. So we say with y the L, then we say we're CELT equals y, the L dot, extract info. And this gets the URL. And by default, it has download equals true. So this would also download the file. But in my case, I say download equals false, because of course, we could download the file and then upload it to assembly AI. But we can actually skip this step and just extract the URL of the hosted file. And then we can pass this to the transcribe endpoint in assembly AI. So we can set download equals false here, then we do one more check. So we say if entries, if the entries key is in the result, then this means we have a playlist URL here. And then we want to return only the first video of this playlist. So we say return results with the key entries, and then the result zero or entry zero. And otherwise, we return the results simply so this is the whole video info object. And then let's create another helper file that I call get all your URL, and this gets the video infos. And first of all, let's simply print all the video infos to see how this looks like. So now let's say if underscore name equals equals main. And then let's first extract the video info. So video info equals get video infos this needs in your L. And then we say all your your L equals get all your your URL, and then we want to print the audio URL. So right now this is none because we don't return anything. So let's get an example URL. So for this, I went to YouTube and search for iPhone 13 review, and I choose this video. So I've entered into review pros and cons. So we can click on this. And then we have to watch an ad. But we can actually copy this URL right away, and then put it in here as a string. And now if we run this, then so we run Python, YouTube extractor dot time, then it should print the whole URL. So yeah, actually, here, I have to pass this YouTube Info. And let's try this again. And yeah, so here, it extracted the whole or it printed the whole info. So this is actually a very long object, a very long dictionary. So I can tell you that this has a key in it, that is called formats. So let's actually print only the formats. And if we run this, then this is also still a very large, very large dictionary. But then again, this is an inner dictionary, and this has a key that is called. Or actually this is a list. So now we can iterate over this. So here we say, for F in video in four formats. And then we can print the F let's print F dot, and it has the key ext for extension. And it also has a your L so we also want to print F dot u r l and now if we run this down, let's see what happens. Um, let's actually comment out the URL because this is super long. So let's print only the extension. And now we see we have a lot of different extensions because it actually start the video in a lot of different formats and with a lot of different resolutions and so on. So, what we want is this one, so the M for a this is a audio format ending. So we now check if the format or if the extension equals equals m for a, then we return the F URL key. So this is the audio, your L. And if we save this, and then print this at the very end, then we should get the URL to this host that file. So you can see this is at this URL. So this is not related to youtube.com. So now let's, for example, click on this. And then we have this in our browser, so we could listen to the audio file. So yeah, this is the first part how to work with the YouTube DL package to extract the infos. And now let's combine this in the main.py. So in main.py, we combine the YouTube extractor infos, with assembly AI and extract the transcript of the video and also the sentiment classification results. So sentiment classification is usually a pretty difficult task. But assembly AI makes it super simple to apply this. So if we go to the website, assembly itad, calm and have a look at the features, then we see they provide core transcription. So this is basically the speech recognition we've seen in the last part. But they also offer audio intelligence features, and they are pretty cool. So there are a lot of features you can use, for example, detect important phrases and words, topic detection, auto chapters, so auto summaries and much more. And if we scroll down, and here we find sentiment analysis. So if we click on this, then we see a short description. So with sentiment analysis, assembly, AI can detect the sentiment of each sentence of speech spoken in your audio files. Sentiment Analysis returns a result of positive, negative or neutral for each sentence in the transcript. So this is exactly what we need here. And it's actually super simple to use this. So the only thing we have to change is when we call the transcript endpoint, we also have to send sentiment analysis equals true as JSON data. So this is all we need to do. So let's go to our code and implement this. So let's import all the files we need. So we want chasen and we say from YouTube extractor we import get all your your URL and get video infos. And from our API helper file, we import save transcripts. Then here, I create one helper function that I call Safe video sentiments. And this gets the URL. And here we get the video in force by calling Get Video enforce with the URL, then we get the odd your your URL by calling get all your your URL and this gets the video infos. And then I simply call the safe transcript function and this gets the audio URL, and it also gets a title. And for the title I want to use the title of the video so we can get this from the video info. So this has a key that is called title. And then I want to slightly modify this so I say title equals title dot strip. So I want to remove all leading and trailing whitespace. And then I want to replace all spaces with a underscore and then I also say title equals data slash plus title. So I want to store this in a separate folder. So here we create this and call this data and now we have to modify the slightly so if we have a look back, then we see this needs the additional arguments sentiment analysis. And now so in the safe transcript file, I will put this as additional argument and I will give this a default of false and then here we say sentiment analysis equals true and now we have to pass this through. So we have to pass this to the get transcription result your l so this also needs this parameter, then the transcribe needs the parameter. And here this needs the parameter. And now as a JSON data that we sent, we put sentiment analysis equals true or false. And this is all that we need. And now of course, I also want to save this. So here we check if the parameter is true, then I create a separate file. So again, I say File Name equals title plus, and then let's call this underscore center, man's dot JSON. And now I say with, with open the file name in write mode, s, f, and then I import JSON in the top import JSON. And then here, we simply say JSON dot dump. And first we have to extract the infos, of course, so we call this sentiments equals data. And then the key. If we have a look at the documentation, then here we see the chase and response now has this additional key sentiment analysis results. So we use this and then we dump the sentiments into the file. And I also want to say indent equals four, to make this a little bit more readable. And now in the main.py, we call this function and see if underscore name equals equals underscore main. And then I want to call the safe video sentiments. And the URL is this one. So let's copy and paste this in here. And now let's run the main the py file and hope that everything works. So the website is downloaded and transcripts from start. So this looks good. So let's wait. Alright, so this was successful, and the transcript was saved. And now we have a look at the data folder, then here, we get the transcript of the video. And we also see our JSON file with all the sentiments. So for each sentiment, we get the text of the sentence. So for example, this one, with the exception of a smaller notch, the iPhone 13 Doesn't seem very new at first glance, but when you start using this flagship, you start to appreciate a bunch of welcome upgrades, then we get the start and end time, then we get the sentiment which is positive. And we also get the confidence, which is pretty high. Then the next example, the new iPhone display is brighter than before the battery life is longer, and Apple has improved, blah, blah, blah. So here also the sentiment is positive, then we have still there are some flaws here. And now the sentiment is negative. So this works pretty well. And yeah, this is how you can apply sentiment analysis with assembly API. Now I want to show you a little bit more code how we could analyze this, for example. So now we can comment this out. So we don't need to download this again. Then we can read our JSON file. And here we store the positives, negatives and neutrals. So we iterate over the data. And then we extract the text, so the text, and we also extract the sentiment, so then we check if this was positive, negative or neutral, and appended to the corresponding list, then we can calculate the length of each list. And then we can print the number of positives, negatives and neutrals. And we can also for example, calculate the positive ratio. So here, we ignore the neutrals and simply do the number of positives divided by the number of positives plus the number of negatives. And now if we save this and run this again, then here we get the number of positives. So 38, only four negatives over all positive ratio is 90%. So with this, you can get a pretty quick overview of a review, for example. And yeah, I think the sentiment classification feature can be applied to so many different use cases. It's so cool. So yeah, I hope you really enjoyed this project. And now what would be really cool is if we could display these information in a nice looking web app, and this is actually one thing that you will learn in the next tutorial together with Misra so let's move on to the next project. All right Now it's time to build a podcast summarization app. And we're also going to build a web interface for this application. In this project, we are again going to use assembly API's API that offers the chapter isation summarization features, and we are going to get the podcast from the Listen notes API. So let's get into it. Here's what our app is going to look like once we are done with it. So we will get a episode ID from listen notes API, I will show you how to do that. And when we click this button, it will give us first the title of the podcast and an image, the name of the episode. And then we will be able to see different chapters and when they start in this episode, and if we click these expanders, we will be able to read a summary of the chapter of this episode. This is all quite exciting to start building a front end for our application to so let's start building it. So in this project, like in the previous ones, we are going to have a main script. And we are going to have a supporting Script API communication, where we have all of our supporting functions that we want to use over and over again, we built this before. So this is the exact same one from the third project the project that we did before. And we will only need to update this and change some things to start doing podcast summarization. So the first thing that I want to update here is that we will not actually need the upload endpoint anymore. So I'm just going to go ahead and delete that one, because the transcripts are going to be sorry, the podcasts are going to be received from the Listen notes API. So it's going to be somewhere on the internet, we will not download them to our own computer. So we can immediately tell assembly AI hey, here's the audio file here is the address of the audio file that I want you to transcribe. And it will be able to do that. So there will be no download or upload needed. That's why I also don't need the upload function. Also the chunk size not relevant anymore. All right. So that's good for now. And the next thing that we want to do is to set up the Listen notes API communication. So we are going to use assembly AI to create the summaries of the podcasts. And we will get these podcasts from listen notes. If you've never heard of it before. Listen, nose is basically a database of podcasts, I think nearly all of the podcasts so you can search for any podcasts. For example, one of my favorites is 99% invisible, and you will be able to get all of its information plus the episodes so you can search for episodes here if you'd like to. What we're going to do with listeners is that we are going to send it a episode Id like specific episode ID that we will find on the platform itself. So let's say I want to get the latest episode of 99% invisible. If I go to the episode page, and go down to use API to fetch this episode, I will see a ID. So this is the ID of the specific ID of this episode. And using this ID, I will be able to get this episode and send it to assembly AI. And this is exactly the ID that we need on our application. So to get that first, of course, we need the Listen notes endpoints. Listen node has a bunch of different endpoints. But the one that we need is the episode endpoint to get the episode information. So I will just name this listen notes, episode, and point and it is this one. And of course, we also need the header again to authenticate ourselves and in the header, we're going to need to put a API key so all you have to do is go to listen nose, create an account and get an API key for yourself. And we are going to go and paste it here. And here as you know, we are importing the API key for assembly yeah, now I'm also going to import the API key for listen notes and we are going to send it with our requests to listen notes. So I will call this the Listen notes. Heather's and this the assembly AI Heather's for listen knows this is named x listen API key. Alright, the first thing that I want to do is to build a new function that is able to get the episode ID and give us the URL to the podcast is audio file. So I will call this one get episode, audio, Euro. And it is going to get an episode ID and we're going to send a GET request to listen notes. Let's build a URL first. The URL is going to consist of the Listen notes episode endpoint. and dash plus the episode ID. And we are going to send a GET request to zero, I will call the response we get is response for now. And the last thing that we need, of course, is the headers for authentication. And that one is called listen notes headers. So as ever do this, we should be able to get a URL for the episode ID. And the information is going to be sent to us in a JSON format. So this way, we will be able to see it. So maybe, let's try this at first and see that it works. So to do that, I am just going to again, import from API communications, import everything, I'll just make this a simple Python script for now. And I'm going to call get episode audio Euro. And I will use the episode ID that I found here. This one to keep things simple. And as a result, we will print the response that we get from listen notes. So let's run this and see what happens. All right, this is really long. So maybe I'll be I will use a pretty print to make it more readable. So this print here. And instead of this, just use pretty print. Okay, let's do it again. All right, that is slightly better. Let's see what kind of information we are working with. Nice, we get the audio URL here. This is the URL of the audio. Let's see where that takes us. Yeah, this is just the audio of this podcast, you can hear it Aerocity that the Roman advance was halted. Nice. Alright, so this is exactly what we need. But if you want, you can also get some extra information about the podcast. If you want to display it in some way, we will definitely use the plus blade. And this is a description of the episode, whether there is explicit content or not the image of this episode, and some extra information about the podcast like Facebook and Google handle, etc. So you get a lot of information. So if you want to make your web application at your interface even more interesting, more interactive, you can of course include more of this in your application. So if we just return data audio from here, we will actually just return to order here all but you know, now that we have all this information might as well extract some more of it. So some of the things that we can get as a thumbnail of this episode, name of the podcast and title of this episode for example, like we said, we will display here so let's do that this will be the audio Euro we will also get the episode thumbnail thumbnail we can get the podcast title that would be in podcasts. And then this podcast specific information and then we get the title. And lastly episode title. thing it is just title. And we can just pass all of this information back episode, thumbnail episode, title, and podcast title. So we don't really need to change much from the rest of the functions for example, transcribe poll get transcription results that we already built beforehand. The only thing that we need to change is now we're not going to do sentiment analysis we want to do use auto chapters features of assembly AI. So I'm just going to rename these to all the chapters. This is just the name of a variable so it is not that important. You can keep it the same. But for readability it's probably better to change it to other chapters but here in this variable we need to change this name to other chapters because we're sending this request to assembly AI and it needs to know that we want other chapters what else we also just updated the name of the heather so it's not only Heather's now it's assembly AI Heather's same here in the polling. We do not need to change anything we are only asking you if a transcription is done or not, again, get transcription result URL we want to change it to or chapters. One other thing that I want to change is it's very small. But normally we were waiting for 30 seconds. But now I want to wait for 60 seconds because podcast episodes tend to be a little bit longer. So we want to wait a little bit longer in between asking assembly AI if the transcription is ready or not. This is another change. But the main work is going to happen in the Save transcript function. So the main change we're going to need to do in save transcript function is that before we were uploading our audio to assembly AI, and then we were getting the result back, but instead this time, we are going to only have a episode ID and then we are going to get the URL from listen notes. And then we're going to pass that to assembly AI to start your transcription. So what I want to do here is to instead of URL and title, I will just give say transcript, the episode ID. And then I will run the get episode audio URL from oops, from inside the Save transcript. And as a result, what we're getting is order your roll episode thumbnail episode title and podcast title. Again, we are not doing sentiment analysis, we are doing order chapters. And we need to pass the order y'all to get transcription results URL, get transcription result URL gets the order your URL as URL and all the chapters but it is not defined. So you know this is what we want to do is hoses coal, it is true here. The next thing that we want to do is to deal with the response that we get from Assembly AI. So let's first see what the response from Assembly AI looks like when we were doing auto chapters. And then let's deal with it. But let's fix some of the problems here. So I will not save it into a file for now. I can comment these out. This will be order chapters. The main thing that I want to do is see what the result looks like. Right? So I will pretty print the data. And the data is already in JSON format. Transcribe Yes, it is. Yeah, so I will just show that. So I'm just going to comment these out for now, just so that you know, we have an idea of what what the response looks like. To run this, I will just pass the episode ID to save transcript. Oh, we're still printing this one. So I will actually stop printing the response from listen notes on what started again. Alright, so we got the results. Let's see what it looks like. It's a lot of information. Let's scroll to the top. And what we wanted was a chapter is basically so let's see what the chapter information includes. So as you can see, this is one chapter. And this is another chapter. So for each chapter, we have the starting point. And then we have the ending point, the gist of the chapter. So really quickly, what is this chapter about? We have a headline for this chapter and a summary. So in a couple of sentences, what is happening in this chapter, what what is the presenter talking about? What we want to do is to show this information on our application right on our web interface. So that's why what we want right now is to extract this information from the response we get from Assembly AI and then save it somewhere and then we can visualize it on our stream that application. So I will undo the commenting here. Also here. So I will call this file with the episode ID it will be episode ID that t x t. And as we always do, I'm just going to say the transcript you know, we don't have to touch this so much, but I will start another file. And let's call this chapters file name and This one will be episode, id plus, maybe let's call like chapters, that. txt. All right, so chapters will be another file. So I'm going to keep all the chapter information somewhere else. And in here, I'm going to write some of the information I got from Assembly AI, specifically the chapter information. And I'm also going to include some of the information I got from the lesson notes API. One mistake here, I do not want it to be a text file, I want it to be a JSON file, so that it will be easier to parse easier to read later. For me, the first thing that I want is the chapters. And I'm going to get that from the data variable. It's called chapter so let's check. The section is called chapters. Yeah. So let's start it out. We'll say episode data. At first, let's include the chapters again, I will call the chapters. And then inside this episode data, what do I want, I want the episode thumbnail. I want the episode title. And I want the podcast title. So that I have all of this information in one place saved on my file system, I can just read it whenever I want and display to the user. And finally dump that to the file sewed data. And I'll let the user know that the transcript is saved. This part we don't need any more. And again, if there is an error, we will just say that there is an error. And we will return true. Now that we got this far already, up till now what we do is get the URL from based on the episode ID from listen notes, and then send it to this URL to some of the AI gets audio chapters information, and then save it to a file. So let's see that this works well. And while it's running, we will start the stream with application. So I will just run this again. But in the main view, of course need to call save transcript. Okay, we're open to doing it. So I will just run the application. And let's also start building our assumed application now. So if you've never heard of streamlet before, it is a really easy way to start building web interfaces. For your application specifically for Python, it's very simple to use it is has a very simple API, it's a very simple library. So what you have to do is you call your import streamlet as a see if you wanted to use it simply. And let's say if you want to, you know put a title in your application, all you need to do is SD title and then you can show that it as a title. So I will run this separately. To show you how it works. And to run from it applications, you just need to say assume it run mean that a PI stream it is installed on your computer like any other Python library, so you just need to use PIP say pip install a streamer then you will be good to go. Unless you make a mistake and call stream with a capital S which is not the case it needs to be a lowercase s so let's do that again. Alright, so this is actually an application it the only thing we're showing right now is a title. And we know what you want it to look like is that so I will start building the elements in this application. So the first thing that you know strikes us is that we have a sidebar, we have a title that says podcast summaries and then we start showing the information from the information we got from the API's that we've been using. So let's put a sidebar maybe let's let's fix the title. First we want to say podcast summaries. title says podcast summaries or it can even say welcome to our to my application that creates podcast summaries. Let's see maybe that won't be too long but we'll see. And let's create the sidebar is quite simple. You call streamlet sidebar dot texts input Yeah. And then you know you can say please input Here's a, an episode ID. And I can also have a button at the end of the sidebar that says, Get podcast summary, maybe with a exclamation point too. So let's run it again. Okay, this is looking more like it, it says Welcome to my application that creates podcast summaries. I can put an episode ID here. And then I can say get podcast summary. So you see that it is running, it is running because I forgot to comment out this one. So it's actually running the whole application. I'll just stop it for now, because we don't have any way of displaying whatever we get back from the API's. So I'll stop this now. And now that we have the application looking more or less like what we want it to look like, let's wait for the chapter results to be printed on our file. And then we will see what it looks like. And then we can start parsing it and then showing it to the user on our streaming application. Okay, so the transcription is saved to our auto chapter creation is done. Let's take a look at what it looks like. We have the chapter section, we have the episode thumbnail episode, title and podcast title. Or good in the chapters we have chapter numbers. And inside each chapter, we have the summary headline just start and end. So it looks good. Let's start showing this. The first thing that I want to show of course, like we did in the beginning, like we showed in the beginning is the name of the episode, or maybe the name of the podcast plus the name of the episode, and then the episode a thumbnail. So how I'm going to show that is again using streamlet. And that is going to be the header for me. And I will include the podcast title. Maybe with a dash in between and the episode title. But as you can see, we do not have it yet. So first, we need to open the file that includes these things. And the file that includes those things is the episode ID that underscore chapters at Jason. So started again, file name would be episode D underscore chapters, the JSON and where do I get the episode ID, I get the episode ID from the text input. So the user is going to input the episode ID and then I am going to save it here in this variable. And that way I will have the file name. So then I just need to open this file and let's call it data. For example, I need to import JSON of course and loaded into the variable data. So in this variable data, what do we have? We have the chapter so first, let's get the chapters, data chapters. And then what we want to get is the podcast title, and then the episode title. Let's change the names episode title. And we also want some nail. And what did we call the thumbnail? We can see here, Episode thumbnail. Alright, so thumbnail. So we're already showing the podcast title and episode title streaming header. And then we can show the image thumbnail with the streamlet image function. And from this point on the next thing that we want to show is the chapters of course, one thing we can do is for example, we can use a for loop could say for chap in chapters. You know, you can just say stream it right or just show the chap but that's one way of doing it. But you're going to have a lot of texts one after another. It's not really nice. What we want is like in the original one I showed you at the beginning, we want to expanders, so it's quite easy to create expanded restreaming again you just say stream it expander and then you want your right what kind of information you want to be in your expander so as the title of the expander I will write here what I want in title and whatever I want inside the expander I'm I'm going to write inside. So I do not need to use a stream. Let's think again, because this is going to be inside the expander. And inside the expander, what I want is the summary. So I think it was called summary. Let's just check again here in our JSON file. In chapters, we have summary. It's called summary. Yes, so I want the summary to be in there. And as a title of the expander, I want there to be the gist of each chapter. So for each chapter is going to show me the expanders for each chapter, there will be expanders, and the title of the expander will be the gist of this chapter. And inside the expander, we're going to have the summary of this chapter. So let's run this and see how it looks. But let's do this. First, make sure that everything works. So I have the title. And then I asked for a episode ID from the user, there is a button that starts this process. And for that to happen, I'll just call this button. So we this information, this button variable has information of whether this button has been pressed or not. And I only want this part, this part to happen this visualization to display part to happen if the button has been pressed. So I'm going to wrap this all in a condition. So otherwise, it's not going to happen. Yes, but right now, if someone presses the button, nothing really happened. So we also need to add an action to this button. And how we're going to do that is we're going to say on click, if this button is clicked, what we want to happen is the same transcript file to be run. So I'm going to call it here in the onclick argument. And we also have arguments, right. And here is how you pass arguments to your function that you call from your button. This is a tupple. That's why you write the variable or the argument that you're passing to the function and the first one, and the second one is empty. Now, when the button is clicked, this one should run and we should be able to see all the information on our application. So let's run it again and see what happens. Yeah, we need to run the streamlined application this time, streamline run main that py. I'll close the old one. So we know the difference and which one is which. This is just the example from the beginning. Alright. So we want to get a podcast, we want to display it. I will get this one again. Let's get the podcast summary. And here it is. We have the title Welcome to my application that creates podcast summaries. Okay, maybe that's a bit too long, I will shorten it. The name of the podcast name of the episode number of the episode also the missing middle. And here are my chapters. So apparently there are 1234567 chapters assembly as API was able to find an E chapter we have the gist of the chapter as the title of the expander. And the chapter summary here valon. One last thing that I want to add is a start and end point of the just the start point of the chapter here, because I want to show like how long each chapter is maybe. So let's do that. So for that I want to see in this JSON file, how it looks. So The start looks like this. So these numbers might look a bit random to you. But basically, they are milliseconds. So I am wanting to turn it into minutes and seconds. And if applicable hours, minutes and seconds. And there is already a function that can do that. Here it is, we don't need to, you know, work on for a long time. Basically, you get the milliseconds and when you get the milliseconds you can get the seconds out of it, how many seconds there are, how many minutes there are and how many hours are so basically you're counting the hours and everything that is on top of the hour is mentioned as a minute if it does not add up to an hour, and everything that does not add up to a minute is pointed out as seconds. And here is what we will return so we'll say the start time is either hours, minutes and seconds. And if there is no hours, we don't have to say zero something something. So just show up minutes, and then seconds. And how I'm going to show it is within the expander title. And I can you know show it with a dash in between. I'll say get clean time. And in there what I want is chapter start. Let's see what it was. Just just start Okay. All right. Let's run it one more time and then see what our application looks like. Awesome. Okay, this is our application on the sidebar. We can input a episode ID that we get from listen notes we can say get podcast summary. It will show a nice title tie A lot of podcasts title of the episode show was a thumbnail of this episode. And for each chapter, we showed the gist of the chapter kind of like a headline when this chapter started. And when you click the expander, when you expand it, you get the summary of this chapter. So this is what we set out to do. When we achieve that, I hope you were able to follow along. Again, don't forget to go grab the code from the GitHub repository. Welcome to the final project. In this one, you will learn a bunch of new exciting technologies. First of all, you will learn how to do real time speech recognition in Python, then you will learn how to use the open AI API and build a virtual assistant or Chatbot. And finally, you will learn a little bit about WebSockets and how to use async i o in Python. So I think this is going to be really fun. And first of all, let me show you the final projects. And now when I run the code, I can start talking to my bot and ask questions. What's your name? How old are you? What's the best ice cream. And you'll see this works. So I think this is super exciting. So now let's get started. Alright, so here I have a new project folder. And again, we have our API secrets file, and now a new main.py file. And the first thing we're going to do is set up real time speech recognition. And for this, we have a detailed blog post on the assembly API block, this will walk you through the step by step. So first of all, we need pi audio to do the microphone recording. So this is the very same thing that we learned in part one, then we use WebSockets. And then we use the assembly AI real time speech recognition feature that works over WebSockets. And then we create a function to send the data from our microphone recording and also a function to receive the data. And then we can do whatever we want with this. So but in order to just copy and paste this, let's actually code this together. So let's get started at one note here, in order to use the real time feature, you need to upgrade your account, though. So yeah, but anyway, let's get started. So let's import all the things we need. So we want pi audio again, then we need WebSockets. So we say import WebSockets. And this is a third party library that I showed you in the beginning that makes it easy to work with WebSockets. And this is built on top of async I O. So now we're going to build async code. Then we also import async i o we also import base 64. So we need to encode the data to a base 64 string before we send this, and then we import chasen to receive the chasen result. And then we save from API secrets, we import our API key from Assembly API. And now the first thing we set up is set up our microphone recording. So for this, we use the exact same code that we learned in part one. So I simply copy and paste this part from here. So let's copy and paste. So we set up our parameters, then our PI audio instance, and then we create our stream. And now we need to define the URL for the WebSocket. And we can find this on the blog post homepage. So here, I can copy and paste the URL. So the URL is WebSockets, and then assembly ai.com. And then real time and then the last part is also important. So here we say question mark sample rate equals 16,000. So this is the same rates that we use here. So make sure to align this with what you have. And now we create one function to send and receive the data. And this is a async function. So we say async. Def, and we call the Send Receive. So this is responsible for both sending and receiving the data. And now we connect to the web socket. And we do this with a async context manager. So again, we see async and then with and then web sockets dot connect, and now we specified the parameters u or L, then we say a we set a ping, time out, and we can set this to 20. For example, then we want a ping interval and this should be five and then we also need to send our authorization token. So the key or the A parameter for this is extra headers. And this is a dictionary with the key authorial session, and the value is our token. And then we say, a sync with AES. And then we can call this what we want. So I say underscore W s for WebSocket, then first we wait to let this connect. So here we say, await async i o async, I O dot sleep 0.1. So be careful here, we cannot use timeout sleep. So we are inside a async function. So we have to use the async sleep function. And then we wait or we we tried to connect and wait for the result. So we say, session underscore begins equals and then again, await underscore W s, and then this is called R E S, V for received, I guess. And then we can print the data and see how this looks. Let's also print sending messages. And now we need to enter functions. So again, async functions. So we say async, def sent. And for now, we simply say pass, and then we say async, def receive. And here also we pass. And actually, these are both, these both will have a infinite to while true loop. So they will run infinitely and listen for incoming data. So here, we say while true. And for now, let's just print sending. And here we also say while true. And here, we simply pass, so I don't want to spoil our output. And now after this, we need to combine them in a async i o ways. So in order to do this, we say we call the gather function. So it's called async, I O dot gather. And now here we gather sense and receive. And this will return two things. So the sent results, and the receive results. So actually, we don't need this. But just in case, we have this here. And now, after the finding this function, of course, we also have to run the code. And we have to run this in an infinite loop. And in order to do this, we call async, I O and then dot run, and then our Send Receive function. So now, this should connect, and then should print sending all the time. So let's run this and hope that this works. So yeah, it's already connected and sending work. So you see, that's why I didn't put the receive in here as well. So we get a lot of outputs. And yeah, I can't even scroll to the top anymore. But basically, yeah, it should have printed this once. And then now this is working so far. So we can continue implementing these two functions now. So now let's implement the send function first. And we wrap this in a try except block. And now we read the microphone input. So we say stream dot reads, and then we specify the frames per buffer. And I also want to say exception on overflow equals false. So sometimes when the WebSocket connection is too slow, there might be an overflow, and then we have an exception, but I don't want this it should still work. And then we need to convert this or encode it in base 64. So we say base 64 b 64, encode our data, and then we decode it again in UTF. Eight. This is what assembly AI expects. Then we need to convert it to a JSON object. So we say JSON dump s and then this is a dictionary with the key audio data. So again, this is what assembly AI needs. And then here we put Can the data and then we send this and we also have to await this so awaits W s sent the JSON data, and then we have to catch a few errors. So let's copy this from our blog post. So these ones, let's copy and paste this in here. So, um, we accept a WebSockets exceptions connection, closed error, and we print the error, and we make sure it's have this code, and then we also break, and then we catch every other error. So it's not best practice to do it like this, but it's fine for this simple tutorial. And then we assert here and then after each wild, true iteration, we also sleep again. And yeah, so now we can copy this whole code and paste it into this. So the code is very similar here. So we have the same try except, but now here, of course, we have to wait for the transcription result from Assembly AI. So we say result, string equals, and then again, we wait and then the w SRESV, then we can convert this to a dictionary by saying results equals JSON dot load from a string. And here the result string. And now this has a few. So this is a JSON object, or now in Python, it's a dictionary. So now we can check a few keys. So we can get the prompt or actually, now this is the transcription of what we set. So we say prompt equals results. And then it tests the key text. And it also has a key that is called message type. So now we check if we have a prompt, and if the results and then the key message underscore type. And now this should be final transcripts. And now what assembly is doing, it will while we are talking, it will already start sending the transcript. And once we finished our sentence, it will do another pass and make a few small corrections if necessary. And then we get the final transcript. So we want only the final transcripts. And now for now, let's print me and then let's print the prompt. And now we want to use our Chatbot. So now let's print bots. And then let's for now let's simply print. Let's print our random text for now. And then we set up this in the next step. But first, I want to test this. So let's say this is my answer. And this is all that we need for the receive functions. So let's clear this and run this and test this. We get an error await wasn't used with future async i o gather Oh, this is a classic mistake. Of course, here, I have to say await async i o gather. So let's run this again. And now it's working. So yeah. What's your name? And you see the transcript is working. So no, I stopped this. But if I scroll up, what's your name? And each time we get this as my answer. So this is working. And now of course here we want to do a clever thing with our prompt and use our virtual assistant. So for this, we now set up open AI. So they have a API that provides access to GPT three. And this can perform a wide variety of natural language tasks. So in order to use this, you have to sign up but you can do this for free and you get a free you get free credits. So this will be more than enough to play around with this. And it's actually super simple to set this up. So let's create a new file. And I call this let's call this open a i helper.py. And then you We also have to install this. So we have to say pip install open API. And then we also after signing up, you get the API token. So we have to copy this in API secrets. And then we can use this. And now we can import open API. And we also need to import our secret. So from API secrets, we import our API key open API, then we have to set this so we say Open API dot API key equals API key. And now we want to do question answering. So the open API API is actually super simple to use. So we can click on examples. And then we see a bunch of different examples. So open AI can do a lot of things, for example, q&a, grammar, correction, text to command classification, a lot of different stuff. So let's click on Q and A. And if we scroll down, then here we find the code examples. So we already set our API key. And now we need to grab this. And let's copy this and let's create a helper function. So define, and let's call this ask computer. And this gets the prompt as input. And now I paste this in here. So we say response equals open AI dot completion dot create. Then here we specify an engine. And now we specify the prompt. And in our case, the prompt is going to be the prompt that we put in. So prompt equals prompt from the parameter. And now there are a lot of other different parameters that you could check out and the documentation. So in my case, I only want to keep the max tokens. So this was specify how long the result can be. And yeah, let's say 100 is fine for this. And now this is all that we need. And now of course, we need to return the response. And this is actually a JSON object, again, are now a dictionary. And we only want to extract the first possible response. So it can also send more if you specify this here. So in our case, we only get one. And then we say response. And this is in the key choices, and then the index zero. So the first choice, and then the key texts. So this will be the actual response from GPT, three. And now in the main, the only thing we have to do is say from open AI helper, we import ask, come pewter, and then down here in the receive functions. Now here we say. Response equals ask computer, and then we put in the parent, and then here, this will be our response. And no, this should be everything that we need. So now let's again, clear this and run the main.pi. And let's hope this works. What's your name? What's your name? How old are you? Where are you from? All right, so let's stop this again. And yeah, you see this works. And this is how you can build a virtual assistant that works with real time speech recognition together with open AI. And yeah, I really hope you enjoyed this project. If you've watched this far, thank you so much for following along. And also, I hope to see you in the future on the assembly AI channel because on there we also create a lot of content around Python speech recognition, and also machine learning. So please check it out. And then I hope to see you soon. Bye

Info

Channel: freeCodeCamp.org

Views: 237,313

Rating: undefined out of 5

Keywords:

Id: mYUyaKmvu6Y

Channel Id: undefined

Length: 119min 40sec (7180 seconds)

Published: Wed Jun 08 2022