This course will teach you how to implement speech
recognition in Python by building five projects. And it's easier than you may think. This course
is taught by these two amazing instructors. Hi, everyone. I'm Patrick. And I'm Alyssa. Patrick
is an experienced software engineer. And Ministra is an experienced data scientist. And they're
both developer advocates at assembly AI. Assembly AI is a deep learning company that
creates a speech to text API, and you'll learn how to use the API. In this course, assembly AI
provided a grant that made this course possible. They also have a YouTube channel where they post
weekly Python and machine learning tutorials. So here's the projects you'll learn to build in
this course. So on the first project, we are going to learn how to deal with audio data, we are
going to see how to capture audio information from a microphone and save it as a WAV file. In the
second project, we are going to learn how to do speech recognition. On top of this audio file that
we have just recorded using assembly API's API. On the third project, we are going to change gears
a little bit and start doing sentiment analysis on iPhone reviews that you find on YouTube. On
the fourth project, we are going to summarize podcasts that we find online and build a web app
to show the results to users. And on the last project we are going to use speech recognition in
combination with open AI API to make an app that can answer users questions. I hope you're excited.
Let's get started. Alright, so in this first part, I teach you some audio processing basics and
Python. So we briefly touched on different audio file formats. Then we talk about different
audio signal parameters that you should know. Then I show you how to use the wav module to load
and save a WAV file, then I show you how you can plot a wave signal. Then I also show you how to do
a microphone recording in Python. And finally, I also show you how to load other file formats like
mp3 files. So let's get started. So first of all, before we write some code, let's talk briefly
about different audio file formats. So here I've listed three of the most popular ones, mp3, FLAC,
and WAV. mp3 is probably the most popular one that you may know. And this is a lossy compression
format. So this means it compresses the data. And during this process, we can lose information.
On the other hand, flax is a loss less compression format. So it also compresses the data. But it
allows us to perfectly reconstruct the original data. And WAV is a uncompressed format. So this
means it stores the data in an uncompressed way. So the audio quality here is the best but also the
file size is the largest. And WAV is the standard for CD audio quality. So we focus on this in the
first part, because it's actually very easy to work with this in Python, because we have a built
in wav module, so we don't have to answer this. And now let's have a look at how we can work with
a WAV audio file. By the way, WAV stands for wave form audio format. And before we start loading
some data, let's talk about a few parameters that we have to understand. So before we load our
first wav file, let's understand a few parameters. So we have the number of channels, this is
usually one or two. So one is also known as mono and two is stereo. So this is the number of
the independent audio channels, for example to or stereo has two independent channels. And this
means it gives you the impression that the audio is coming from two different directions. Then
we have the sample with this is the number of bytes for each sample. So this will get more
clear later when we have a look at an example. And then we have the frame rate, which is also
known as the sample rate or sample frequency. And this is a very important parameter. So this
means the number of samples for each second. And for example, you may have seen this number a
lot. So this means 44,100 hertz or 44.1 kilohertz. This is usually the standard sampling
rate for CD quality. So this means we get 44,100 sample values in each second. And
then we have the number of frames. So yeah, this is the total number of frames we get. And
then we have the values in each frame. And when we load this, this will be in a binary format,
but we can convert this to integer values later. So now let's have a look at how to load a file. So
With the wave four wave module. So here I prepared a simple wav file. So this is five seconds
long. So let's actually listen to this. Hi, my name is Patrick, and I'm a developer advocate
at assembly AI. And yeah, here, we also see a few parameters are ready. So now let's go back
to the code. And now let's load this file. So for this, we create an object and we simply say
wave dot open, then we have to give it the name. So this is called Petrick, dot wav. And to read
this, we say we read this in read binary. And now we can extract all these different parameters.
For example, let's print the, let's say the number of channels. And we get this by saying
object dot get N channels. Then we also want to print the sample with so print the sample with
and we get this biasing object dot get Sam with, then let's print the frame rates. So print
frame rates, and we get this by saying object dot get frame rate. Then what do we also want, we
also want the number of frames. So we print the number of frames, and then we say object dot get,
and not the N channels and frames. And lastly, let's also print the, all the parameters.
So we can get all the parameters at once by saying object dot get per params. And now
let's print this. So if we run this, so I say python wave example.pi, then we see we have only
one channel. So this is a mono format, we have a sample width of two, so we have two bytes for
each sample, then we have a frame rate of 16,000, and a number of frames of 80,000. And here we also
have all the parameters as a WAV params object. So for example, now we can calculate the time
of the audio. And we as I set the frame rate is the number of samples per second. So if we get the
whole number of frames, so the number of frames or number of samples, divided by the frame rate, then
we get the time and seconds. So now if we print t audio, and run this, then we get 5.0. So five
seconds, so this is the same that we see here. So this works. And now let's get the actual frames.
So the frames equals object dot get frames gets no sorry, object dot read frames, and
then we can give it the number of frames, or, or we can I think we can pass in minus one. So
this will read all frames. And let's for example, so let's print the type of this to see what this
is. And then also print the type of frames cero. And then let's print the length of frames. So
now let's run this. And then we see this is a bytes object. And so here we see class
bytes. And when we extract the first byte, then we see this as a integer. And now the
length of the frames object is 160,000. So this is not the same as the number
of frames. So if we have a look here, the number of frames is 80,000. But if
we extract the length here, then this is twice as much. And if you listen carefully in
the beginning here, I mentioned the sample with this means we have two bytes per sample. So now if
we actually cut collate this divided by two, then again, we get our 80,000 number of frames. And
yeah, this is how easily we can read a WAV file. And then we can work with this and work with the
frames. And now to load or to save the data again, we also open let's call this object, new equals,
and then we say wave dot open, then we can give it a new name, let's say Patrick underscore new dot
wave. And now we open this in write binary mode. And now we can basically call all those functions
as setters and not as getters. So we say object, new dot sets, number of channels, so this is only
one channel, then we say object new dot set sample with this should be to object new dot set frame
rates, this is 16,000 as a float, so these are all the parameters we should set. And then we can
write the frames by saying object, new dot write frames, and then the frames. So here we have the
original frames. And now basically, we duplicate the file. So we write the frames and what I
forgot. So when we are done with opening this and reading all the information we want, we all
should also should call objects, thoughts close. And then the same here. So here we say object, new
dot, close. And this will close the file objects. And yeah, so now if we save this and run this,
then now here we see we have the duplicated file. And if we run this, Hi, my name is Patrick,
and I'm a developer advocate at assembly AI, then we see this works and the test the same data
in it. So yeah, this is how to work with a WAV file and with the wav module. So now let's see how
we can plot a WAV file object. Now plotting a wave signal is actually not too difficult. So for this,
we just need to install matplotlib and NumPy. Then we import all the modules we need. So again,
we need wave, then we need matplotlib.pi plot as PLT t, and then we import NumPy, num, pi,
s and p. Then again, I want to read the wav file. So I say wav dot open. And this was Patrick
dot wav in read binary mode, then I want to read all the parameters that I need. So I
want to read the sample frequency. And this is objects dot get frame rates, then I need
the number of samples. So this is object dot get and frames. And then I also need the actual
signal. So I call this signal dot wave equals object dot read frames minus one. So all the
frames and then I can say object dot close. And then for example, we can calculate the number
on the the length of the signal in seconds. So I call this T audio. And if you remember this from
the first code, so this is the number of samples divided by the sample frequency. And now let's
print the T audio and save this and run this just as a test. So now we can run Python plot audio and
we get 5.0. So this works so far. So now I want to create the plot. So this is a bytes object. So we
can create a numpy array out of this very easily. So I call this signal array equals and then we
can use NumPy from buffer and here we put in the signal, signal wave. And we can also specify a
data type. So here I want to be to have this int C Steam. And now we need an object for the
x axis or the so the times axis. So we say times equals, and here we use the numpy linspace
function, this gets zero as the start. And the end is the length of the signal. So this is T
audio or five seconds. And then we can also give this a number parameter and the number is the
number of samples. So if you remember, so the signal wave. So here, we basically get a sample
for each point in time. And now we want to plot this. So we create a figure. So we say PLT dot
figure, and we give this a fixed size of 15 by five, then we say P L T dot plot, and we
want to plot the times against the signal array, then we simply give it a title
P L T dot title, and let's call this audio signal. Then I also want to say P L T dot
y label, and the y label is the SIG no wave, and the P L T x label is the time time
in seconds. And then we say P L T x Lim. And we limit this to be between zero and T audios
for five seconds. And then we say P L T dot show. And this is all we need. And now if we run
this, then this should open our plot. And here we have it. So here we have our audio signal plotted
as a WAV plot. And this is how easily we can do it with matplotlib and the wave module. Now let's
learn how we can record with our microphone and capture the microphone input in Python. So for
this, we use PI audio, a popular Python library. And this provides bindings for port or your the
cross platform audio i o library. So with this, we can easily play and record audio. And this works
on Linux, Windows and Mac. And for each platform, there's a slightly different installation command
that they recommend. So for example, on Windows, you should use this command. On Mac, you also
have to install port or your first so if you use homebrew, then you can easily say brew install
port audio and then pip install PI audio. And on Linux, you can use this command. So I already
did this. So here I'm on a Mac. So I use brew, install port audio, and then pip install PI
audio. And now I can import this so I say import pi audio. And I also want to import wav
to save the recording later. Then I want to set up a few parameters. So I say frames per
buffer. And here I say 3200. So you can play around with this a little bit. Then I specified
the format. So the format equals pi or your dot P R int, sixth team. So this is basically the
same that we use here. So here we use NumPy, n 16. And then here we have the P r and 16. Then I also
specify the number of channels. So here I say one, so simply a mono format, and then also
the frame rate. So the rate here again, I say 16,000. So again, you can use a different
rate and play around with this. Then we create our PI audio object. So we say p equals pi audio.
Then we create a stream object. So we say stream equals P dot open. And now we put in all the
parameters. So we say format equals format, then I need the channels. So channels equals
channels, the rate equals the rate. We also want to capture the input so input equals true.
And lastly, we save frames per buffer equals frames per buffer. Then we have our stream
object. So now we can didn't start recording. And now we want to record for a number of seconds.
So here I say five seconds. And then we store the frames. And we store this in a list object. And
now we can iterate over and say, for i in range, and we start at zero and go until, and now
we say rate divided by frames per buffer times the seconds. And then we convert this to a
integer, not a float. And with this, we basically record for five seconds. And then we read each
chunk. So we say, data equals, and then here we say stream dot read. And then we read the frames
per buffer. And then we say, frames, dots, append the data. So basically frames per buffer. So this
means we read this many frames in at one, so with one iteration. And now we have it. So now we
can close everything again. So we can say stream dot stop stream, then we can also say stream
dot close. And we can say P dot terminate. So now we have everything correctly
shut down. And now we can, for example, save the frames object again in a WAV file. So
for this, I say, object equals wav dot open. And let's call this output dot wav wave, and in
write binary mode, then we set all the parameters. So I said object set number of channels, this is
the channels parameter objects dot set sample with this is the this we get from P dot get sample size
of our format, then object dot set frame rate, this is the rate and then we can write all the
frames. So we say object, dot write frames. And we need to write this in binaries. So we can
create a binary string like this, so a string, and then dot join. And here we put in our frames
list. So this will combine all the elements in our frames list into a binary string. And then we say
object dot close, and this is everything we need to do. So now we can run Python, record Mike and
test this. Hi, I'm Patrick. This is a test 123. And now it's done. So here we have our new
file. So let's play this and see if this works. Hi, I'm Patrick. This is a test 123 And it worked
awesome. And now As last step I also want to show you how to load mp3 files and not only wav files,
so let's do this. So to load mp3 files, we need an additional third party library and I recommend
to use pie it up so this is a very simple to use library it provides an simple and easy high level
interface to load and also to manipulate audio. So in order to install this, we also need to install
FFmpeg so on the Mac, I use homebrew so I had to say brew install FFmpeg and after this you can
simply say pip install and then pipe it up. And now this should install it. So he has already
set up sfide. And now we can for example say from pi up we want to import the audio segment.
And then we can say audio equals audio segment and then here we can say from mp3 If we have
an mp3, in my case right now I only have a from WAV and then here I am, let's load the Petrick
dot WAV and then we can for example also very easily manipulate this by saying audio plus
six audio plus six. So this will increase the volume by six d be 60 P then we can also for
example Repeat the clips. So we say audio equals all your times to, then we can use a fade
in, for example, audio equals audio dot fade underscore in with with 2000 milliseconds. So
two seconds fade in the same works with fade out. So yeah, this is how we can manipulate.
And then we can say audio dot export. And then I want to export this in. Let's call this mash
up dot mp3. And then I have to say format equals as a string, mp3. And now for example, I could
load this by saying, or you two equals or your dot from mp3. And then here I use mesh up
dot mp3 and then print done so that we see it arrives at this part. And now let's
say python, and then the load mp3 file. And yeah, this works. So now here we have our
mp3 file. And we could also load it like this. So yeah, that's how you can use the pilot module
to load other file formats as well. And that's all I wanted to show you. In this first part, I hope
you learned a little bit about audio processing, and Python and charters. And now let's move on
and let's learn how to do speech recognition and Python. Hey, and welcome. In this project, we
are going to learn how to do speech recognition in Python, it's going to be very simple, what
we're going to do is to take the audio file that we recorded in the previous projects,
and turn it into a text file. Let me show you how the project works. So here is the audio
file that we recorded in the previous project. Hi, I'm Patrick, this is a test
123. And if we run our script we get the text transcription of
this audio file. So like this here. Hi, I'm Patrick. This is a test 123. So let's
learn how to implement this in Python. So for this project, we are mainly going to need
two things assembly API's API to do the speech recognition and the request library
from Python to talk to assembly API's API. So let's first go ahead and get a API token from
Assembly AI. It's very simple, you just need to go to assembly ai.com and create a free account.
Once you have an account, you can sign in and just copy your API keys by clicking here. And
right away, I'm going to create a configure file and put my API key here. Once I've done that, now I have a way of
authenticating who I am with SMD as API. And now we can start setting up how to upload transcribe
and get the transcription from Assembly as API. The next thing that I want to do is to have a
main file that is going to have all my code. What I need to do in there is to import the requests
library so that I can talk to the assembly API. So this project is going to have four steps.
The first one is to upload the file that we have locally to assembly AI. Second one is to start the
transcription. Third one is keep pulling assembly as API to see when the transcription is done.
And lastly, we're going to see this transcript. So uploading is actually quite simple. If
we go to the documentation of assembly AI, we will see here uploading local file files for
transcription. So I can just copy and paste this and change the code as we need it. So basically,
yeah, okay, we are importing the request library already. The file name we are going to get
from the terminal. So I will set that later. Just a couple of things that we need to pay
attention here. Basically, there is a way to read the audio file from our file system.
And then we need to set up a Heather's these headers are used for authentication.
So we can actually already set this because this is not going to be your API token. We
set it to be API key assembly API, right. And we need to import it here of course. Alright, that's done. So we also have
a upload endpoint for assembly and this one is API that assembly comm V to upload. But
you know, this might be something that we need also later, so I'm just going to put
this to a separate value variable and then just call this one here. You So when
we're doing, when you're when you're uploading a file to assembly AI, we are doing a post
request. And this post requests, you need to, you need to send this post request to the upload
endpoint, you need to include your API key included in the headers. And of course, you
need the data. So the file that you read, and we are reading the data through the
read file function in chunks, because some of the AI requires it to be in chunks. And
in chunk sizes of five megabytes. Basically, this is a number of bytes that are in there. While
we're at it, we can already get the file name from a terminal tool, right. So for that, I just need
to import system. And inside system, the second, or the first, not the zeroeth variable, or
the argument is going to be the file name. And here, let's clean up a little bit. All right,
now, we should be able to just run a command on the terminal, include the name of the file that
we want to upload, and it will be uploaded to assembly AI. And we will also let's print
the response that we get from Assembly AI to see what kind of response we get. Again,
this is the file that we are working with. Hi, I'm Patrick, this is a test 123. And what
we need to do right now is one Python main.py and the name of the file in
this case, output dot love. All right, so we uploaded our file to assembly
AI successfully. In the response, what we get is the upload URL, so where your data where your
audio file lives right now. And using this, we can start the transcription. So for the
transcription, let's again cheat by getting the code from the docs. Here is the data
the code that we need, starting from here. So this is a transcription endpoint, you can see
that it ends differently than the upload endpoint. This one ends with upload this one ends with
transcript, I will call this the transcript endpoint. Heather's we already have a header, we
don't really need this anymore. The endpoint is transcript endpoint. JSON is the data that we are
sending to or the data that we want somebody AI to transcribe. So we are going to need to give it the
order URL, we already have the order URL, right. So we got the response. But we did not extract it
from the response. So let's do that. Oh, do URL is response, JSON. And it was cold, upload girl. So we're going to give us audio euro to hear
because that was just an example. Okay. And this way, we will have started the transcription.
And let's do this and see what the result is. I will run this again. Same thing. Alright, so we got a much longer response.
In this response, what we have, we have a bunch of information about the transcription
that we've just started. So you do not get the transcript itself immediately. Because
depending on the length of your audio, it might take a minute or two, right? So what
we get instead is the ID of this transcription job. So by using this ID, from now on, we can
ask somebody AI Hey, here is the ID of my job, this transcription job that I submitted to
you, is it ready or not. And if it's not ready, it will tell you it's not ready yet. It's still
processing. If it's ready, it will tell you hey, it's completed. And here is your transcript. So
that's why the next thing that we want to build is the polling we're going to keep we're going to
write the code that will keep polling assembly AI to tell us if the transcription is ready or
not. But before we go further, let me first clean up this code a little bit so that you know
everything is nicely packed in functions, we can use them pre use them again if we need
to. So this one is the upload function. Yes, and what it needs to return is the audio
URL. We do not need to print the response anymore. We've already seen what it looks like. And we
need to put the header separately because we want both upload and transcribe and basically
everything else will be able to reach this variable called Heather's for transcription
again, I will create a function called transcribe and what I need to return from the transcription
function is the ID. So I will just say job ID and that would be response dot JSON and ad again
don't need this anymore. I'll just call this transcript response to make it clear this will
be upload response. Let's call this transcript request. So everything is nice and clean. This is
this and this goes here. And for upload response, we use it here. And we need to return job ID.
Alright, so now we have them nicely wrapped up in different functions, and everything else looks
good. Let's run this again to see that it works. Now, of course, I'm not calling the function.
So let me call the functions and then run it, upload and transcribe. But of course, I also need
to pass the file name to the upload function. So let's do that too. Do your URL is not defined, or
the URL of course, then I also need to pass audio URL, audio URL to transcribe Good thing we tried.
So this will be returned from the upload function, and then we will pass it to the transcript
function. And as a result, we will get job ID. And then I can print job ID to see
that it worked. Let's see. Yeah, yes, I do get a job ID. So
okay, things are working. The next thing that we want to do is
to set up the fault polling function. So the first thing we need to do for
that is to create a polling endpoint polling endpoint. So as you know, we had the
transcript endpoint and the upload endpoint here. That's how we communicate with somebody as API
with polling endpoint is going to be specific to the transcription job that you just submitted. So
to create that, all you need to do is to combine transcript endpoint with a slash in between, and
add the job ID but the job ID is a bit weak. So I'll just going to call this transcript ID. So
by doing that, now, you have a URL that you can ask to assembly AI with which you can ask assembly
AI if your job is done already or not. And again, we're going to send a request to assembly AI,
this time is going to be a get request. Well, I'll just copy this so that it's easy. Instead of
post is going to be a get request, we're going to use a polling endpoint instead of the transcript
endpoint. And we just need the Heather's for this we do not because we are not sending any
information to assembly AI. We're just asking for information. If you're familiar with requests,
normally, this might be very simple for you. But all you need to know about this is that when
you're sending data to an API, you use the post request type. And if you're only getting some
information, as the name suggests, you use the get request type. So the results resulting or
the response that we get is going to be called polling response. Let's see it's not job ID,
I called transcript ID so that it works. Then we get the polling response. And I can also
show you what the polling response looks like. Looks good. Okay, let's run this. Alright,
so we got response 200. That means things are going well, but actually, what I need
is a JSON response. So let's see that again. Yes, this is more like it. So again, we get the
idea of the response language model that is being used and some other bunch of information. But what
we need here is the status. So let's see where that is. Oh, yeah, there it is. So we have status
processing. This means that the transcription is still being is still being prepared, so we need to
wait a little bit more and we need to ask assembly AI again soon to see if the transcription is done
or not. What we normally do is to wait 30 seconds or maybe 60 seconds depending on the length of
your transcription with length of your audio file. And then when it's done, it will give us status
completed. So let's write the bits where we ask assembly AI repetitively if the transcription
is done or not. So for that, we can just create a very simple while loop
while true. We do the polling and if poling response dot JSON status equals to
complete it. We return the polling response. But if polling response, status is error, because
it is possible that it might air out, then we will return. There. I'll just wrap this into
a function, I can call this gets transcription results URL. And while we're at it, we might
as well so wrap the polling into a function, do we need to pass anything to
it? Yes, the transcript A D, need to pass a transcript ID to it. And instead
of printing the response, we will just return the response. So instead of doing the request here,
all we would need to do is to call this function with the transcript ID, we can pass the
transcript ID here or might as well I will just call the transcription
or transcribe function in here. And the resulting thing would be the transcript
ID from the transcription function. And then I'm going to pass this transcript ID to the polling
function that is going to return to me the polling response. I will call this polling response
data. And inside this data, so this is not needed anymore. Yeah, so the polling response that JSON
is what is being passed, I call that the data. So I change this to data here. And also data
here. Yeah, then I'll just pass the data. If it's error, I can still pass the data
just to see the response and what kind of what kind of error that we got and hear
them and just just say none. All right, let's let's do a little cleanup. So we have a
nice upload function, a transcript function, what we did before was we were calling the upload
function, getting the audio URL and then passing it to transcribe, but I'm running transcribe here.
So I do not need this anymore. I still need to pass the order, you're all to transcribe. So then
I would need to pass it to here. So instead of this, just need to call this function with
the audio y'all. Yeah, let's put these here. Actually, to make it a bit more understandable,
maybe instead of passing the string error, I can just pass whatever error
that was that happened in my transcription, then you know, we'll
be able to see what went wrong. Alright, so what we get as a result from get
transcription result ID is the data. And if there is any error, so then let's Why not run
this and see what the data is going to look like. All right, so we get something
really, really big. Let's see, maybe I'll just clear this and run it again, just
so that you know, we can see it more clearly. Alright, so we get the ID, again, language
model that is being used, etc. Now we want the results. Yes, it is under Text. Hi, I'm
Patrick. This is a test 123 is what we get. And we also get the breakdown of words, when each
word started meant each word ended in milliseconds confidence of this classification, and much
more information. What we want to do though, even though we have all this information,
we want to write this transcript that is generated by assembly AI into a text file. So in
this next step, that's what we're going to do. Alright, let's come up with a file name for
this file. We can call it actually we can just call it the same thing as a file name plus
txt. So the file name okay, we were using the argument or variable file name too.
So maybe let's find something else. We'll just call this txt file name. And it will
be the file name plus dot txt. We can also just you know remove the dot valve or dot mp4 or
whatever but let's not deal with that for now. So once I have this I will just open it I will open it in Writing format.
And inside I will write data. Texts because that's where we have the text
information on the transcript. If you remember here, this was a response we got
and text includes the transcription. And I can just prompt the user saying that
transcription is saved, transcriptions saved, are happy. Of course, there is a possibility
that our transcription errors out. So you want to cover that too. If you remember, we returned
data and error, what we can do is you can say if data is returned, this happens. But if it
errored out, I will just print error. No, it didn't work out and the error itself so that
we see you know what went wrong. Okay, let's do a little cleanup. Again, I want to wrap this all
up in a function, we can call the Save transcript. Data and error will be returned from get
transcripts URL, it means the audio URL, so I will just need to pass over your URL here. And with
that, we're actually more or less ready. So let's run this and see if we get what we need the
transcript saved in a file. For that after the after calling the upload function, I can move
this one here and calling the upload function here called the upload function. And then
I call the Save transcript function. And let's quickly follow that up. I call this a
transcript function. It calls get transcription result URL, get transcription result, URL
calls, transcribe, transcribe is here. It starts with transcription process and then
get transcription result URL also calls polling. So it keeps pulling assembly AI. And when it's
done, it returns something and then we deal with it in the Save transcript function, and we
either save a transcript or if there is an error, we display the error. So let's run
this and see if we get any errors. Transcription saved. Alright, let's see.
Output was that txt. If I open it up, it looks quite small. Maybe I
can, if I open it like this. Yes. Hi, I'm Patrick. This is a test 123 is the
result that we're getting. So that's awesome, we actually achieved what we wanted to do. So in this
next couple of minutes, I actually want to clean up the code once again, because you're going to
build a couple more projects. And we want to have a python file that has some reusable code, so we
don't have to reinvent the wheel all the time. So let me first go here, actually, when we're doing
the polling, if we just have a while true loop, it's going to keep asking assembly AI for results.
And you know that that might be unnecessary. So what we can do is to include some waiting times in
between, so it can ask if it's not completed yet. It Can Wait, let's say 30 seconds to ask again.
So we can inform the user waiting 30 seconds. What I need is a time module. So let's
close 30. And I will just import time here. And this way, it should be waiting
30 seconds in between asking assembly AI if the transcript is ready or not.
And okay, let's create that extra file that we have API communication, I'll call it.
Yes. So I will move all of the functions that communicate with the API there. So
I need to move the upload function. I need to move transcribe
poll all of these actually. So just remember that Yeah. Let's
see. Did we miss anything though, I'll just remove these from here. File Name
can stay here, of course, headers and the upload and transcript endpoints need to live
here because they are needed by the functions. In here, we have to import the requests library.
So we don't need it anymore here. We need to import the assembly AI API key. system
needs to stay here time needs to go there. And we also need to import from API communication.
Import, we'll just say all. And that way we can use these functions in our main python script.
I will run this again to make sure that it is still working. So I will delete the text file
that was created, I will keep the output. Nice. So we also get the prompt that the program
is waiting 30 seconds before asking again. Oh, yeah, we passed the filename. But of course,
it might not exist there. So let's go and fix that the file name is here, we only pass it to the
upload function and the upload function is here now. And in the Save transcript, we do not pass
it, but we're actually using it. So what we can do is to just also pass the file name here. And
that should be fine. It should fix the problem. Transcription saves. Alright, let's see output
one. txt Hi. Oh, like this. Hi, I'm Patrick. This is a test 123. So this is a very short audio
file. And we actually been using it over and over again. So I want to also show that this code
is working to you using another audio file. This is the audio of the one of the latest
short videos that I made for our YouTube channel. I was just talking about
what natural language processing is. So this time, maybe if I add
underscores, it will be easier to call. Yes, I will just copy its name. And
when I'm calling the script, I will use its name. This will probably take a little bit longer
because the audio file has been using is only a couple of seconds. And this one is one minute. So
we will see what the results are going to show us. Right here we go the transcription is saved
we find here. Right, this is exactly what I was talking about. Let's listen to it while the
transcription is open. Kind of like severe, Best Funny. Well, not now. But probably very soon. We
haven't seen gigantic leaps over the last couple of years in terms of how computers can understand
and use natural language. Alright, you get the idea. So our code works. This is amazing. I hope
you've been able to follow along. If you want to have the code. Don't forget that you can go get
it and the GitHub repository we prepared for you using the link in the description. Welcome back to
the third project. So in this one, I teach you how to apply sentiment analysis to YouTube videos. So
you will learn how to use the YouTube DL package to automatically download YouTube videos are only
extract the information you need. And then I also teach you how to apply sentiment analysis. So
in this example, I use iPhone 30 review videos. And the result that we get looks like this. So for
each sentence in the video, we get the text. And then we also get the sentiment. So this could be
positive or negative or neutral. For example, if we read this text here, the new iPhone display is
brighter than before the battery life is longer, and so on, and the sentiment is positive. And
here the text is still there are some flaws, and now the sentiment is negative. So this works
pretty well. And this can be applied to so many use cases. So let's get started and see how to do
this. So here I've created a new project folder. And we already have our API secrets and the API
py file with the helper functions to work with the assembly API API. And now let's create two
more files. So the main.py file that will combine everything and the YouTube extractor files, this
is another helper file to extract the infos from the YouTube video. And for this, we are going to
use the YouTube DL package. This is a very popular command line program to download videos from
YouTube and other sites. And we can use this as command line program. But we can also use this in
Python. So for this we say pip install YouTube DL, and then we can import this so we say import
YouTube DL. And then we set up an instance so we say why the L equals YouTube DL dot YouTube DL.
And now I'm going to show you how you can download a video file and also how you can extract the
infos from a video. So let's create a helper function that I call get video infos and this
takes and URL And now we use the YT l object as a context manager. So we say with y the L,
then we say we're CELT equals y, the L dot, extract info. And this gets the URL. And
by default, it has download equals true. So this would also download the file. But
in my case, I say download equals false, because of course, we could download the file
and then upload it to assembly AI. But we can actually skip this step and just extract the
URL of the hosted file. And then we can pass this to the transcribe endpoint in assembly AI.
So we can set download equals false here, then we do one more check. So we say if entries, if the
entries key is in the result, then this means we have a playlist URL here. And then we want to
return only the first video of this playlist. So we say return results with the key entries, and
then the result zero or entry zero. And otherwise, we return the results simply so this is the whole
video info object. And then let's create another helper file that I call get all your URL, and
this gets the video infos. And first of all, let's simply print all the video infos to see how
this looks like. So now let's say if underscore name equals equals main. And then let's first
extract the video info. So video info equals get video infos this needs in your L. And then we
say all your your L equals get all your your URL, and then we want to print the audio URL. So right
now this is none because we don't return anything. So let's get an example URL. So for this, I
went to YouTube and search for iPhone 13 review, and I choose this video. So I've entered into
review pros and cons. So we can click on this. And then we have to watch an ad. But we
can actually copy this URL right away, and then put it in here as a string. And
now if we run this, then so we run Python, YouTube extractor dot time, then it should print
the whole URL. So yeah, actually, here, I have to pass this YouTube Info. And let's try this again.
And yeah, so here, it extracted the whole or it printed the whole info. So this is actually a
very long object, a very long dictionary. So I can tell you that this has a key in it, that is
called formats. So let's actually print only the formats. And if we run this, then this is also
still a very large, very large dictionary. But then again, this is an inner dictionary,
and this has a key that is called. Or actually this is a list. So now we can iterate over this.
So here we say, for F in video in four formats. And then we can print the F let's print F
dot, and it has the key ext for extension. And it also has a your L so we also want to
print F dot u r l and now if we run this down, let's see what happens. Um, let's actually
comment out the URL because this is super long. So let's print only the extension. And now we see
we have a lot of different extensions because it actually start the video in a lot of different
formats and with a lot of different resolutions and so on. So, what we want is this one, so the
M for a this is a audio format ending. So we now check if the format or if the extension
equals equals m for a, then we return the F URL key. So this is the audio,
your L. And if we save this, and then print this at the very end, then we
should get the URL to this host that file. So you can see this is at this URL. So this is not
related to youtube.com. So now let's, for example, click on this. And then we have this in our
browser, so we could listen to the audio file. So yeah, this is the first part how to work with
the YouTube DL package to extract the infos. And now let's combine this in the main.py. So in
main.py, we combine the YouTube extractor infos, with assembly AI and extract the transcript of
the video and also the sentiment classification results. So sentiment classification is usually
a pretty difficult task. But assembly AI makes it super simple to apply this. So if we go to
the website, assembly itad, calm and have a look at the features, then we see they provide core
transcription. So this is basically the speech recognition we've seen in the last part. But
they also offer audio intelligence features, and they are pretty cool. So there are a lot
of features you can use, for example, detect important phrases and words, topic detection,
auto chapters, so auto summaries and much more. And if we scroll down, and here we find sentiment
analysis. So if we click on this, then we see a short description. So with sentiment analysis,
assembly, AI can detect the sentiment of each sentence of speech spoken in your audio files.
Sentiment Analysis returns a result of positive, negative or neutral for each sentence in
the transcript. So this is exactly what we need here. And it's actually super simple to
use this. So the only thing we have to change is when we call the transcript endpoint, we also
have to send sentiment analysis equals true as JSON data. So this is all we need to do.
So let's go to our code and implement this. So let's import all the files we need. So we want
chasen and we say from YouTube extractor we import get all your your URL and get video infos.
And from our API helper file, we import save transcripts. Then here, I create one helper
function that I call Safe video sentiments. And this gets the URL. And here we get the video in
force by calling Get Video enforce with the URL, then we get the odd your your URL by calling
get all your your URL and this gets the video infos. And then I simply call the safe transcript
function and this gets the audio URL, and it also gets a title. And for the title I want to use
the title of the video so we can get this from the video info. So this has a key that is called
title. And then I want to slightly modify this so I say title equals title dot strip. So I want
to remove all leading and trailing whitespace. And then I want to replace all spaces with a
underscore and then I also say title equals data slash plus title. So I want to store this in
a separate folder. So here we create this and call this data and now we have to modify the
slightly so if we have a look back, then we see this needs the additional arguments
sentiment analysis. And now so in the safe transcript file, I will put this as
additional argument and I will give this a default of false and then here we say sentiment analysis
equals true and now we have to pass this through. So we have to pass this to the get transcription
result your l so this also needs this parameter, then the transcribe needs the parameter. And here
this needs the parameter. And now as a JSON data that we sent, we put sentiment analysis equals
true or false. And this is all that we need. And now of course, I also want to save this.
So here we check if the parameter is true, then I create a separate file. So again, I say
File Name equals title plus, and then let's call this underscore center, man's dot JSON. And now I
say with, with open the file name in write mode, s, f, and then I import JSON in the top import
JSON. And then here, we simply say JSON dot dump. And first we have to extract the infos, of course,
so we call this sentiments equals data. And then the key. If we have a look at the documentation,
then here we see the chase and response now has this additional key sentiment analysis results. So
we use this and then we dump the sentiments into the file. And I also want to say indent equals
four, to make this a little bit more readable. And now in the main.py, we call this function and see
if underscore name equals equals underscore main. And then I want to call the safe video sentiments.
And the URL is this one. So let's copy and paste this in here. And now let's run the main
the py file and hope that everything works. So the website is downloaded and transcripts from
start. So this looks good. So let's wait. Alright, so this was successful, and the transcript was
saved. And now we have a look at the data folder, then here, we get the transcript of the video. And
we also see our JSON file with all the sentiments. So for each sentiment, we get the text of
the sentence. So for example, this one, with the exception of a smaller notch, the
iPhone 13 Doesn't seem very new at first glance, but when you start using this flagship, you
start to appreciate a bunch of welcome upgrades, then we get the start and end time, then
we get the sentiment which is positive. And we also get the confidence, which is pretty
high. Then the next example, the new iPhone display is brighter than before the battery life
is longer, and Apple has improved, blah, blah, blah. So here also the sentiment is positive,
then we have still there are some flaws here. And now the sentiment is negative. So this works
pretty well. And yeah, this is how you can apply sentiment analysis with assembly API. Now I want
to show you a little bit more code how we could analyze this, for example. So now we can comment
this out. So we don't need to download this again. Then we can read our JSON file. And here we
store the positives, negatives and neutrals. So we iterate over the data. And then
we extract the text, so the text, and we also extract the sentiment, so then we
check if this was positive, negative or neutral, and appended to the corresponding list, then we
can calculate the length of each list. And then we can print the number of positives, negatives
and neutrals. And we can also for example, calculate the positive ratio. So here, we ignore
the neutrals and simply do the number of positives divided by the number of positives plus the number
of negatives. And now if we save this and run this again, then here we get the number of
positives. So 38, only four negatives over all positive ratio is 90%. So with this,
you can get a pretty quick overview of a review, for example. And yeah, I think the sentiment
classification feature can be applied to so many different use cases. It's so cool. So yeah,
I hope you really enjoyed this project. And now what would be really cool is if we could display
these information in a nice looking web app, and this is actually one thing that you will
learn in the next tutorial together with Misra so let's move on to the next project. All right
Now it's time to build a podcast summarization app. And we're also going to build a web
interface for this application. In this project, we are again going to use assembly API's API that
offers the chapter isation summarization features, and we are going to get the podcast from the
Listen notes API. So let's get into it. Here's what our app is going to look like once we are
done with it. So we will get a episode ID from listen notes API, I will show you how to do that.
And when we click this button, it will give us first the title of the podcast and an image, the
name of the episode. And then we will be able to see different chapters and when they start in
this episode, and if we click these expanders, we will be able to read a summary of the chapter
of this episode. This is all quite exciting to start building a front end for our application to
so let's start building it. So in this project, like in the previous ones, we are going to have a
main script. And we are going to have a supporting Script API communication, where we have all
of our supporting functions that we want to use over and over again, we built this before. So
this is the exact same one from the third project the project that we did before. And we will only
need to update this and change some things to start doing podcast summarization. So the first
thing that I want to update here is that we will not actually need the upload endpoint anymore. So
I'm just going to go ahead and delete that one, because the transcripts are going to be sorry,
the podcasts are going to be received from the Listen notes API. So it's going to be somewhere
on the internet, we will not download them to our own computer. So we can immediately tell
assembly AI hey, here's the audio file here is the address of the audio file that I want you
to transcribe. And it will be able to do that. So there will be no download or upload
needed. That's why I also don't need the upload function. Also the chunk size not relevant
anymore. All right. So that's good for now. And the next thing that we want to do is to set
up the Listen notes API communication. So we are going to use assembly AI to create the summaries
of the podcasts. And we will get these podcasts from listen notes. If you've never heard of it
before. Listen, nose is basically a database of podcasts, I think nearly all of the podcasts so
you can search for any podcasts. For example, one of my favorites is 99% invisible,
and you will be able to get all of its information plus the episodes so you can search
for episodes here if you'd like to. What we're going to do with listeners is that we are going to
send it a episode Id like specific episode ID that we will find on the platform itself. So let's say
I want to get the latest episode of 99% invisible. If I go to the episode page, and go
down to use API to fetch this episode, I will see a ID. So this is the ID of the specific
ID of this episode. And using this ID, I will be able to get this episode and send it to assembly
AI. And this is exactly the ID that we need on our application. So to get that first, of
course, we need the Listen notes endpoints. Listen node has a bunch of different endpoints. But
the one that we need is the episode endpoint to get the episode information. So I will just name
this listen notes, episode, and point and it is this one. And of course, we also need the header
again to authenticate ourselves and in the header, we're going to need to put a API key so
all you have to do is go to listen nose, create an account and get an API key for yourself.
And we are going to go and paste it here. And here as you know, we are importing the API
key for assembly yeah, now I'm also going to import the API key for listen notes and we are
going to send it with our requests to listen notes. So I will call this the Listen notes.
Heather's and this the assembly AI Heather's for listen knows this is named
x listen API key. Alright, the first thing that I want to do is to build a
new function that is able to get the episode ID and give us the URL to the podcast is audio file.
So I will call this one get episode, audio, Euro. And it is going to get an episode ID and we're
going to send a GET request to listen notes. Let's build a URL first. The URL is going to
consist of the Listen notes episode endpoint. and dash plus the episode ID. And we
are going to send a GET request to zero, I will call the response we get is
response for now. And the last thing that we need, of course, is the headers for authentication.
And that one is called listen notes headers. So as ever do this, we should be able to get
a URL for the episode ID. And the information is going to be sent to us in a JSON format. So
this way, we will be able to see it. So maybe, let's try this at first and see that it works.
So to do that, I am just going to again, import from API communications, import everything, I'll
just make this a simple Python script for now. And I'm going to call get episode audio Euro.
And I will use the episode ID that I found here. This one to keep things simple.
And as a result, we will print the response that we get from listen
notes. So let's run this and see what happens. All right, this is really long. So
maybe I'll be I will use a pretty print to make it more readable. So this print here.
And instead of this, just use pretty print. Okay, let's do it again. All right, that is slightly better. Let's see
what kind of information we are working with. Nice, we get the audio URL here. This is the
URL of the audio. Let's see where that takes us. Yeah, this is just the audio of
this podcast, you can hear it Aerocity that the Roman advance was halted. Nice.
Alright, so this is exactly what we need. But if you want, you can also get some extra information
about the podcast. If you want to display it in some way, we will definitely use the plus
blade. And this is a description of the episode, whether there is explicit content or not the image
of this episode, and some extra information about the podcast like Facebook and Google handle, etc.
So you get a lot of information. So if you want to make your web application at your
interface even more interesting, more interactive, you can of course include more
of this in your application. So if we just return data audio from here, we will actually
just return to order here all but you know, now that we have all this information
might as well extract some more of it. So some of the things that we can get as a
thumbnail of this episode, name of the podcast and title of this episode for example, like
we said, we will display here so let's do that this will be the audio Euro we
will also get the episode thumbnail thumbnail we can get the podcast title that would
be in podcasts. And then this podcast specific information and then we get
the title. And lastly episode title. thing it is just title. And we can just
pass all of this information back episode, thumbnail episode, title, and podcast title.
So we don't really need to change much from the rest of the functions for example, transcribe
poll get transcription results that we already built beforehand. The only thing that we need
to change is now we're not going to do sentiment analysis we want to do use auto chapters features
of assembly AI. So I'm just going to rename these to all the chapters. This is just the name of a
variable so it is not that important. You can keep it the same. But for readability it's probably
better to change it to other chapters but here in this variable we need to change this name
to other chapters because we're sending this request to assembly AI and it needs to know
that we want other chapters what else we also just updated the name of the heather so it's not
only Heather's now it's assembly AI Heather's same here in the polling. We do not need to change
anything we are only asking you if a transcription is done or not, again, get transcription
result URL we want to change it to or chapters. One other thing that I want to change is it's
very small. But normally we were waiting for 30 seconds. But now I want to wait for 60 seconds
because podcast episodes tend to be a little bit longer. So we want to wait a little bit longer in
between asking assembly AI if the transcription is ready or not. This is another change. But the main
work is going to happen in the Save transcript function. So the main change we're going to need
to do in save transcript function is that before we were uploading our audio to assembly AI, and
then we were getting the result back, but instead this time, we are going to only have a episode
ID and then we are going to get the URL from listen notes. And then we're going to pass that to
assembly AI to start your transcription. So what I want to do here is to instead of URL and title,
I will just give say transcript, the episode ID. And then I will run the get episode audio URL
from oops, from inside the Save transcript. And as a result, what we're getting is
order your roll episode thumbnail episode title and podcast title. Again, we are not doing
sentiment analysis, we are doing order chapters. And we need to pass the order y'all to get
transcription results URL, get transcription result URL gets the order your URL as URL and
all the chapters but it is not defined. So you know this is what we want to
do is hoses coal, it is true here. The next thing that we want to do is to deal with
the response that we get from Assembly AI. So let's first see what the response from Assembly
AI looks like when we were doing auto chapters. And then let's deal with it. But
let's fix some of the problems here. So I will not save it into a file for now. I can
comment these out. This will be order chapters. The main thing that I want to do is see what the
result looks like. Right? So I will pretty print the data. And the data is already in
JSON format. Transcribe Yes, it is. Yeah, so I will just show that. So I'm
just going to comment these out for now, just so that you know, we have an idea of what
what the response looks like. To run this, I will just pass the episode
ID to save transcript. Oh, we're still printing this one.
So I will actually stop printing the response from listen notes on what started again. Alright, so we got the results.
Let's see what it looks like. It's a lot of information. Let's scroll to the
top. And what we wanted was a chapter is basically so let's see what the chapter information
includes. So as you can see, this is one chapter. And this is another chapter. So for each chapter,
we have the starting point. And then we have the ending point, the gist of the chapter. So
really quickly, what is this chapter about? We have a headline for this chapter and a
summary. So in a couple of sentences, what is happening in this chapter, what what is the
presenter talking about? What we want to do is to show this information on our application right
on our web interface. So that's why what we want right now is to extract this information
from the response we get from Assembly AI and then save it somewhere and then we can
visualize it on our stream that application. So I will undo the commenting here. Also here. So I will call this file with the episode ID it
will be episode ID that t x t. And as we always do, I'm just going to say the transcript you know,
we don't have to touch this so much, but I will start another file. And let's call this chapters
file name and This one will be episode, id plus, maybe let's call like chapters,
that. txt. All right, so chapters will be another file. So I'm going to keep all the
chapter information somewhere else. And in here, I'm going to write some of the information I
got from Assembly AI, specifically the chapter information. And I'm also going to include some of
the information I got from the lesson notes API. One mistake here, I do not want it to be
a text file, I want it to be a JSON file, so that it will be easier to parse easier to read
later. For me, the first thing that I want is the chapters. And I'm going to get that from the data
variable. It's called chapter so let's check. The section is called chapters. Yeah. So
let's start it out. We'll say episode data. At first, let's include the chapters
again, I will call the chapters. And then inside this episode data, what
do I want, I want the episode thumbnail. I want the episode title.
And I want the podcast title. So that I have all of this information in one
place saved on my file system, I can just read it whenever I want and display to the user.
And finally dump that to the file sewed data. And I'll let the user know that the transcript
is saved. This part we don't need any more. And again, if there is an error, we will just say
that there is an error. And we will return true. Now that we got this far already, up till
now what we do is get the URL from based on the episode ID from listen notes, and
then send it to this URL to some of the AI gets audio chapters information, and then save
it to a file. So let's see that this works well. And while it's running, we will start the stream
with application. So I will just run this again. But in the main view, of course need to call
save transcript. Okay, we're open to doing it. So I will just run the application. And let's also
start building our assumed application now. So if you've never heard of streamlet before, it is a
really easy way to start building web interfaces. For your application specifically for Python, it's
very simple to use it is has a very simple API, it's a very simple library. So what you have to do
is you call your import streamlet as a see if you wanted to use it simply. And let's say if you want
to, you know put a title in your application, all you need to do is SD title and then you can show
that it as a title. So I will run this separately. To show you how it works. And to run from it applications, you just need
to say assume it run mean that a PI stream it is installed on your computer like any other
Python library, so you just need to use PIP say pip install a streamer then you will be
good to go. Unless you make a mistake and call stream with a capital S which is not the case it
needs to be a lowercase s so let's do that again. Alright, so this is actually an application it
the only thing we're showing right now is a title. And we know what you want it to look like is
that so I will start building the elements in this application. So the first thing that
you know strikes us is that we have a sidebar, we have a title that says podcast summaries and
then we start showing the information from the information we got from the
API's that we've been using. So let's put a sidebar maybe let's let's fix the
title. First we want to say podcast summaries. title says podcast summaries or
it can even say welcome to our to my application that creates podcast summaries.
Let's see maybe that won't be too long but we'll see. And let's create the sidebar is quite
simple. You call streamlet sidebar dot texts input Yeah. And then you know you can
say please input Here's a, an episode ID. And I can also have a button at the
end of the sidebar that says, Get podcast summary, maybe with a exclamation
point too. So let's run it again. Okay, this is looking more like it, it says
Welcome to my application that creates podcast summaries. I can put an episode ID here.
And then I can say get podcast summary. So you see that it is running, it is running because
I forgot to comment out this one. So it's actually running the whole application. I'll just stop
it for now, because we don't have any way of displaying whatever we get back from the API's.
So I'll stop this now. And now that we have the application looking more or less like what we
want it to look like, let's wait for the chapter results to be printed on our file. And then we
will see what it looks like. And then we can start parsing it and then showing it to the user on our
streaming application. Okay, so the transcription is saved to our auto chapter creation is
done. Let's take a look at what it looks like. We have the chapter section, we have the episode
thumbnail episode, title and podcast title. Or good in the chapters we have chapter numbers.
And inside each chapter, we have the summary headline just start and end. So it looks good.
Let's start showing this. The first thing that I want to show of course, like we did in the
beginning, like we showed in the beginning is the name of the episode, or maybe the name
of the podcast plus the name of the episode, and then the episode a thumbnail. So how I'm
going to show that is again using streamlet. And that is going to be the header for
me. And I will include the podcast title. Maybe with a dash in between and the episode
title. But as you can see, we do not have it yet. So first, we need to open the file that includes
these things. And the file that includes those things is the episode ID that underscore chapters
at Jason. So started again, file name would be episode D underscore chapters, the JSON and where
do I get the episode ID, I get the episode ID from the text input. So the user is going to input the
episode ID and then I am going to save it here in this variable. And that way I will have the
file name. So then I just need to open this file and let's call it data. For example,
I need to import JSON of course and loaded into the variable data. So in this variable data, what do we have? We
have the chapter so first, let's get the chapters, data chapters. And then what we want to get is
the podcast title, and then the episode title. Let's change the names episode title. And we
also want some nail. And what did we call the thumbnail? We can see here, Episode thumbnail.
Alright, so thumbnail. So we're already showing the podcast title and episode title streaming
header. And then we can show the image thumbnail with the streamlet image function. And from this
point on the next thing that we want to show is the chapters of course, one thing we can do is
for example, we can use a for loop could say for chap in chapters. You know, you
can just say stream it right or just show the chap but that's one way of doing
it. But you're going to have a lot of texts one after another. It's not really nice. What we want
is like in the original one I showed you at the beginning, we want to expanders, so it's quite
easy to create expanded restreaming again you just say stream it expander and then you want your
right what kind of information you want to be in your expander so as the title of the expander
I will write here what I want in title and whatever I want inside the expander I'm I'm
going to write inside. So I do not need to use a stream. Let's think again, because
this is going to be inside the expander. And inside the expander, what I want is the
summary. So I think it was called summary. Let's just check again here in our JSON file. In
chapters, we have summary. It's called summary. Yes, so I want the summary to be in there. And as
a title of the expander, I want there to be the gist of each chapter. So for each chapter is
going to show me the expanders for each chapter, there will be expanders, and the title of the
expander will be the gist of this chapter. And inside the expander, we're going to have the
summary of this chapter. So let's run this and see how it looks. But let's do this. First,
make sure that everything works. So I have the title. And then I asked for a episode ID from the
user, there is a button that starts this process. And for that to happen, I'll just call
this button. So we this information, this button variable has information of whether this
button has been pressed or not. And I only want this part, this part to happen this visualization
to display part to happen if the button has been pressed. So I'm going to wrap this all in a
condition. So otherwise, it's not going to happen. Yes, but right now, if someone presses the button,
nothing really happened. So we also need to add an action to this button. And how we're going
to do that is we're going to say on click, if this button is clicked, what we want to happen
is the same transcript file to be run. So I'm going to call it here in the onclick argument. And
we also have arguments, right. And here is how you pass arguments to your function that you call
from your button. This is a tupple. That's why you write the variable or the argument that you're
passing to the function and the first one, and the second one is empty. Now, when the button is
clicked, this one should run and we should be able to see all the information on our application.
So let's run it again and see what happens. Yeah, we need to run the streamlined application
this time, streamline run main that py. I'll close the old one. So we know
the difference and which one is which. This is just the example
from the beginning. Alright. So we want to get a podcast, we want to
display it. I will get this one again. Let's get the podcast summary. And here it is.
We have the title Welcome to my application that creates podcast summaries. Okay, maybe that's a
bit too long, I will shorten it. The name of the podcast name of the episode number of the episode
also the missing middle. And here are my chapters. So apparently there are 1234567 chapters assembly
as API was able to find an E chapter we have the gist of the chapter as the title of the expander.
And the chapter summary here valon. One last thing that I want to add is a start and end point of the
just the start point of the chapter here, because I want to show like how long each chapter is
maybe. So let's do that. So for that I want to see in this JSON file, how it looks. So The start
looks like this. So these numbers might look a bit random to you. But basically, they are
milliseconds. So I am wanting to turn it into minutes and seconds. And if applicable
hours, minutes and seconds. And there is already a function that can do that. Here it is,
we don't need to, you know, work on for a long time. Basically, you get the milliseconds and when
you get the milliseconds you can get the seconds out of it, how many seconds there are, how
many minutes there are and how many hours are so basically you're counting the hours and
everything that is on top of the hour is mentioned as a minute if it does not add up to an hour, and
everything that does not add up to a minute is pointed out as seconds. And here is what we
will return so we'll say the start time is either hours, minutes and seconds. And if there
is no hours, we don't have to say zero something something. So just show up minutes, and then
seconds. And how I'm going to show it is within the expander title. And I can you know show it
with a dash in between. I'll say get clean time. And in there what I want is chapter start. Let's
see what it was. Just just start Okay. All right. Let's run it one more time and then see what our
application looks like. Awesome. Okay, this is our application on the sidebar. We can input a episode
ID that we get from listen notes we can say get podcast summary. It will show a nice title tie A
lot of podcasts title of the episode show was a thumbnail of this episode. And for each chapter,
we showed the gist of the chapter kind of like a headline when this chapter started. And when you
click the expander, when you expand it, you get the summary of this chapter. So this is what we
set out to do. When we achieve that, I hope you were able to follow along. Again, don't forget
to go grab the code from the GitHub repository. Welcome to the final project. In this one, you
will learn a bunch of new exciting technologies. First of all, you will learn how to do
real time speech recognition in Python, then you will learn how to use the open AI
API and build a virtual assistant or Chatbot. And finally, you will learn a little bit about
WebSockets and how to use async i o in Python. So I think this is going to be really fun. And
first of all, let me show you the final projects. And now when I run the code, I can start talking
to my bot and ask questions. What's your name? How old are you? What's the best ice cream. And you'll see this works. So I think this is
super exciting. So now let's get started. Alright, so here I have a new project folder.
And again, we have our API secrets file, and now a new main.py file. And the first
thing we're going to do is set up real time speech recognition. And for this, we have a
detailed blog post on the assembly API block, this will walk you through the step by step.
So first of all, we need pi audio to do the microphone recording. So this is the very same
thing that we learned in part one, then we use WebSockets. And then we use the assembly AI real
time speech recognition feature that works over WebSockets. And then we create a function to
send the data from our microphone recording and also a function to receive the data. And
then we can do whatever we want with this. So but in order to just copy and paste this, let's
actually code this together. So let's get started at one note here, in order to use the real time
feature, you need to upgrade your account, though. So yeah, but anyway, let's get started. So
let's import all the things we need. So we want pi audio again, then we need WebSockets. So we
say import WebSockets. And this is a third party library that I showed you in the beginning that
makes it easy to work with WebSockets. And this is built on top of async I O. So now we're going to
build async code. Then we also import async i o we also import base 64. So we need to encode the data
to a base 64 string before we send this, and then we import chasen to receive the chasen result. And
then we save from API secrets, we import our API key from Assembly API. And now the first thing we
set up is set up our microphone recording. So for this, we use the exact same code that we learned
in part one. So I simply copy and paste this part from here. So let's copy and paste. So we set
up our parameters, then our PI audio instance, and then we create our stream. And now we need to
define the URL for the WebSocket. And we can find this on the blog post homepage. So here, I can
copy and paste the URL. So the URL is WebSockets, and then assembly ai.com. And then real time and
then the last part is also important. So here we say question mark sample rate equals 16,000. So
this is the same rates that we use here. So make sure to align this with what you have. And now we
create one function to send and receive the data. And this is a async function. So we say async.
Def, and we call the Send Receive. So this is responsible for both sending and receiving the
data. And now we connect to the web socket. And we do this with a async context manager. So again,
we see async and then with and then web sockets dot connect, and now we specified the parameters
u or L, then we say a we set a ping, time out, and we can set this to 20. For example, then we
want a ping interval and this should be five and then we also need to send our authorization
token. So the key or the A parameter for this is extra headers. And this is a dictionary with the
key authorial session, and the value is our token. And then we say, a sync with AES. And then we
can call this what we want. So I say underscore W s for WebSocket, then first we wait
to let this connect. So here we say, await async i o async, I O dot sleep 0.1. So be
careful here, we cannot use timeout sleep. So we are inside a async function. So we have to use
the async sleep function. And then we wait or we we tried to connect and wait for the result.
So we say, session underscore begins equals and then again, await underscore W s, and then this
is called R E S, V for received, I guess. And then we can print the data and see how this looks.
Let's also print sending messages. And now we need to enter functions. So again, async functions. So
we say async, def sent. And for now, we simply say pass, and then we say async, def receive. And
here also we pass. And actually, these are both, these both will have a infinite to while true
loop. So they will run infinitely and listen for incoming data. So here, we say while true.
And for now, let's just print sending. And here we also say while true. And here, we simply
pass, so I don't want to spoil our output. And now after this, we need to combine them in a async
i o ways. So in order to do this, we say we call the gather function. So it's called async, I O dot
gather. And now here we gather sense and receive. And this will return two things. So the sent
results, and the receive results. So actually, we don't need this. But just in case, we have this
here. And now, after the finding this function, of course, we also have to run the code.
And we have to run this in an infinite loop. And in order to do this, we call async, I O and
then dot run, and then our Send Receive function. So now, this should connect, and then should
print sending all the time. So let's run this and hope that this works. So yeah, it's
already connected and sending work. So you see, that's why I didn't put the receive in here
as well. So we get a lot of outputs. And yeah, I can't even scroll to the top anymore. But
basically, yeah, it should have printed this once. And then now this is working so far. So we
can continue implementing these two functions now. So now let's implement the send function first.
And we wrap this in a try except block. And now we read the microphone input. So we say stream dot
reads, and then we specify the frames per buffer. And I also want to say exception on overflow
equals false. So sometimes when the WebSocket connection is too slow, there might be an
overflow, and then we have an exception, but I don't want this it should still work. And
then we need to convert this or encode it in base 64. So we say base 64 b 64, encode our data, and
then we decode it again in UTF. Eight. This is what assembly AI expects. Then we need to convert
it to a JSON object. So we say JSON dump s and then this is a dictionary with the key audio data.
So again, this is what assembly AI needs. And then here we put Can the data and then we send this and
we also have to await this so awaits W s sent the JSON data, and then we have to catch a few
errors. So let's copy this from our blog post. So these ones, let's copy and paste this in here. So,
um, we accept a WebSockets exceptions connection, closed error, and we print the error, and we make
sure it's have this code, and then we also break, and then we catch every other error. So it's not
best practice to do it like this, but it's fine for this simple tutorial. And then we assert here
and then after each wild, true iteration, we also sleep again. And yeah, so now we can copy this
whole code and paste it into this. So the code is very similar here. So we have the same try
except, but now here, of course, we have to wait for the transcription result from Assembly AI. So
we say result, string equals, and then again, we wait and then the w SRESV, then we can
convert this to a dictionary by saying results equals JSON dot load from a string. And
here the result string. And now this has a few. So this is a JSON object, or now in Python,
it's a dictionary. So now we can check a few keys. So we can get the prompt or actually, now
this is the transcription of what we set. So we say prompt equals results. And then it tests the
key text. And it also has a key that is called message type. So now we check if we have a
prompt, and if the results and then the key message underscore type. And now this should be
final transcripts. And now what assembly is doing, it will while we are talking, it will already
start sending the transcript. And once we finished our sentence, it will do another pass and make
a few small corrections if necessary. And then we get the final transcript. So we want only the
final transcripts. And now for now, let's print me and then let's print the prompt. And now we
want to use our Chatbot. So now let's print bots. And then let's for now let's simply print. Let's
print our random text for now. And then we set up this in the next step. But first, I want to test
this. So let's say this is my answer. And this is all that we need for the receive functions.
So let's clear this and run this and test this. We get an error await wasn't used
with future async i o gather Oh, this is a classic mistake. Of course, here,
I have to say await async i o gather. So let's run this again. And now it's
working. So yeah. What's your name? And you see the transcript is working. So no,
I stopped this. But if I scroll up, what's your name? And each time we get this as my answer. So
this is working. And now of course here we want to do a clever thing with our prompt and
use our virtual assistant. So for this, we now set up open AI. So they have a
API that provides access to GPT three. And this can perform a wide variety of natural
language tasks. So in order to use this, you have to sign up but you can do this for free
and you get a free you get free credits. So this will be more than enough to play around with
this. And it's actually super simple to set this up. So let's create a new file. And I call
this let's call this open a i helper.py. And then you We also have to install this.
So we have to say pip install open API. And then we also after signing up, you get
the API token. So we have to copy this in API secrets. And then we can use this. And now we
can import open API. And we also need to import our secret. So from API secrets, we import our API
key open API, then we have to set this so we say Open API dot API key equals API key. And now we
want to do question answering. So the open API API is actually super simple to use. So we can click
on examples. And then we see a bunch of different examples. So open AI can do a lot of things, for
example, q&a, grammar, correction, text to command classification, a lot of different stuff. So
let's click on Q and A. And if we scroll down, then here we find the code examples. So we already
set our API key. And now we need to grab this. And let's copy this and let's create a helper
function. So define, and let's call this ask computer. And this gets the prompt as
input. And now I paste this in here. So we say response equals open AI dot completion
dot create. Then here we specify an engine. And now we specify the prompt. And in
our case, the prompt is going to be the prompt that we put in. So prompt equals
prompt from the parameter. And now there are a lot of other different parameters that you could
check out and the documentation. So in my case, I only want to keep the max tokens. So this was
specify how long the result can be. And yeah, let's say 100 is fine for this. And now this is
all that we need. And now of course, we need to return the response. And this is actually a JSON
object, again, are now a dictionary. And we only want to extract the first possible response. So
it can also send more if you specify this here. So in our case, we only get one. And then
we say response. And this is in the key choices, and then the index zero. So the first
choice, and then the key texts. So this will be the actual response from GPT, three. And now in
the main, the only thing we have to do is say from open AI helper, we import ask, come pewter,
and then down here in the receive functions. Now here we say. Response equals ask computer,
and then we put in the parent, and then here, this will be our response. And no, this should
be everything that we need. So now let's again, clear this and run the main.pi. And let's hope
this works. What's your name? What's your name? How old are you? Where are you from? All right, so let's stop this again. And yeah,
you see this works. And this is how you can build a virtual assistant that works with real
time speech recognition together with open AI. And yeah, I really hope you enjoyed this
project. If you've watched this far, thank you so much for following along. And also,
I hope to see you in the future on the assembly AI channel because on there we also create a lot
of content around Python speech recognition, and also machine learning. So please check
it out. And then I hope to see you soon. Bye