Python Speech Recognition Tutorial – Full Course for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
This course will teach you how to implement speech  recognition in Python by building five projects.   And it's easier than you may think. This course  is taught by these two amazing instructors. Hi,   everyone. I'm Patrick. And I'm Alyssa. Patrick  is an experienced software engineer. And Ministra   is an experienced data scientist. And they're  both developer advocates at assembly AI.   Assembly AI is a deep learning company that  creates a speech to text API, and you'll learn   how to use the API. In this course, assembly AI  provided a grant that made this course possible.   They also have a YouTube channel where they post  weekly Python and machine learning tutorials. So   here's the projects you'll learn to build in  this course. So on the first project, we are   going to learn how to deal with audio data, we are  going to see how to capture audio information from   a microphone and save it as a WAV file. In the  second project, we are going to learn how to do   speech recognition. On top of this audio file that  we have just recorded using assembly API's API. On   the third project, we are going to change gears  a little bit and start doing sentiment analysis   on iPhone reviews that you find on YouTube. On  the fourth project, we are going to summarize   podcasts that we find online and build a web app  to show the results to users. And on the last   project we are going to use speech recognition in  combination with open AI API to make an app that   can answer users questions. I hope you're excited.  Let's get started. Alright, so in this first part,   I teach you some audio processing basics and  Python. So we briefly touched on different   audio file formats. Then we talk about different  audio signal parameters that you should know.   Then I show you how to use the wav module to load  and save a WAV file, then I show you how you can   plot a wave signal. Then I also show you how to do  a microphone recording in Python. And finally, I   also show you how to load other file formats like  mp3 files. So let's get started. So first of all,   before we write some code, let's talk briefly  about different audio file formats. So here I've   listed three of the most popular ones, mp3, FLAC,  and WAV. mp3 is probably the most popular one that   you may know. And this is a lossy compression  format. So this means it compresses the data.   And during this process, we can lose information.  On the other hand, flax is a loss less compression   format. So it also compresses the data. But it  allows us to perfectly reconstruct the original   data. And WAV is a uncompressed format. So this  means it stores the data in an uncompressed way.   So the audio quality here is the best but also the  file size is the largest. And WAV is the standard   for CD audio quality. So we focus on this in the  first part, because it's actually very easy to   work with this in Python, because we have a built  in wav module, so we don't have to answer this.   And now let's have a look at how we can work with  a WAV audio file. By the way, WAV stands for wave   form audio format. And before we start loading  some data, let's talk about a few parameters   that we have to understand. So before we load our  first wav file, let's understand a few parameters.   So we have the number of channels, this is  usually one or two. So one is also known as   mono and two is stereo. So this is the number of  the independent audio channels, for example to   or stereo has two independent channels. And this  means it gives you the impression that the audio   is coming from two different directions. Then  we have the sample with this is the number of   bytes for each sample. So this will get more  clear later when we have a look at an example.   And then we have the frame rate, which is also  known as the sample rate or sample frequency.   And this is a very important parameter. So this  means the number of samples for each second.   And for example, you may have seen this number a  lot. So this means 44,100 hertz or 44.1 kilohertz.   This is usually the standard sampling  rate for CD quality. So this means we get   44,100 sample values in each second. And  then we have the number of frames. So yeah,   this is the total number of frames we get. And  then we have the values in each frame. And when   we load this, this will be in a binary format,  but we can convert this to integer values later.   So now let's have a look at how to load a file. So  With the wave four wave module. So here I prepared   a simple wav file. So this is five seconds  long. So let's actually listen to this. Hi,   my name is Patrick, and I'm a developer advocate  at assembly AI. And yeah, here, we also see a   few parameters are ready. So now let's go back  to the code. And now let's load this file. So   for this, we create an object and we simply say  wave dot open, then we have to give it the name.   So this is called Petrick, dot wav. And to read  this, we say we read this in read binary. And now   we can extract all these different parameters.  For example, let's print the, let's say the   number of channels. And we get this by saying  object dot get N channels. Then we also want   to print the sample with so print the sample with  and we get this biasing object dot get Sam with,   then let's print the frame rates. So print  frame rates, and we get this by saying object   dot get frame rate. Then what do we also want, we  also want the number of frames. So we print the   number of frames, and then we say object dot get,  and not the N channels and frames. And lastly,   let's also print the, all the parameters.  So we can get all the parameters at once   by saying object dot get per params. And now  let's print this. So if we run this, so I say   python wave example.pi, then we see we have only  one channel. So this is a mono format, we have   a sample width of two, so we have two bytes for  each sample, then we have a frame rate of 16,000,   and a number of frames of 80,000. And here we also  have all the parameters as a WAV params object.   So for example, now we can calculate the time  of the audio. And we as I set the frame rate is   the number of samples per second. So if we get the  whole number of frames, so the number of frames or   number of samples, divided by the frame rate, then  we get the time and seconds. So now if we print t   audio, and run this, then we get 5.0. So five  seconds, so this is the same that we see here. So   this works. And now let's get the actual frames.  So the frames equals object dot get frames gets   no sorry, object dot read frames, and  then we can give it the number of frames,   or, or we can I think we can pass in minus one. So  this will read all frames. And let's for example,   so let's print the type of this to see what this  is. And then also print the type of frames cero.   And then let's print the length of frames. So  now let's run this. And then we see this is a   bytes object. And so here we see class  bytes. And when we extract the first byte,   then we see this as a integer. And now the  length of the frames object is 160,000.   So this is not the same as the number  of frames. So if we have a look here,   the number of frames is 80,000. But if  we extract the length here, then this is   twice as much. And if you listen carefully in  the beginning here, I mentioned the sample with   this means we have two bytes per sample. So now if  we actually cut collate this divided by two, then   again, we get our 80,000 number of frames. And  yeah, this is how easily we can read a WAV file.   And then we can work with this and work with the  frames. And now to load or to save the data again,   we also open let's call this object, new equals,  and then we say wave dot open, then we can give it   a new name, let's say Patrick underscore new dot  wave. And now we open this in write binary mode.   And now we can basically call all those functions  as setters and not as getters. So we say object,   new dot sets, number of channels, so this is only  one channel, then we say object new dot set sample   with this should be to object new dot set frame  rates, this is 16,000 as a float, so these are   all the parameters we should set. And then we can  write the frames by saying object, new dot write   frames, and then the frames. So here we have the  original frames. And now basically, we duplicate   the file. So we write the frames and what I  forgot. So when we are done with opening this   and reading all the information we want, we all  should also should call objects, thoughts close.   And then the same here. So here we say object, new  dot, close. And this will close the file objects.   And yeah, so now if we save this and run this,  then now here we see we have the duplicated file.   And if we run this, Hi, my name is Patrick,  and I'm a developer advocate at assembly AI,   then we see this works and the test the same data  in it. So yeah, this is how to work with a WAV   file and with the wav module. So now let's see how  we can plot a WAV file object. Now plotting a wave   signal is actually not too difficult. So for this,  we just need to install matplotlib and NumPy. Then   we import all the modules we need. So again,  we need wave, then we need matplotlib.pi plot   as PLT t, and then we import NumPy, num, pi,  s and p. Then again, I want to read the wav   file. So I say wav dot open. And this was Patrick  dot wav in read binary mode, then I want to   read all the parameters that I need. So I  want to read the sample frequency. And this is   objects dot get frame rates, then I need  the number of samples. So this is object dot   get and frames. And then I also need the actual  signal. So I call this signal dot wave equals   object dot read frames minus one. So all the  frames and then I can say object dot close.   And then for example, we can calculate the number  on the the length of the signal in seconds. So I   call this T audio. And if you remember this from  the first code, so this is the number of samples   divided by the sample frequency. And now let's  print the T audio and save this and run this just   as a test. So now we can run Python plot audio and  we get 5.0. So this works so far. So now I want to   create the plot. So this is a bytes object. So we  can create a numpy array out of this very easily.   So I call this signal array equals and then we  can use NumPy from buffer and here we put in the   signal, signal wave. And we can also specify a  data type. So here I want to be to have this int   C Steam. And now we need an object for the  x axis or the so the times axis. So we say   times equals, and here we use the numpy linspace  function, this gets zero as the start. And the   end is the length of the signal. So this is T  audio or five seconds. And then we can also give   this a number parameter and the number is the  number of samples. So if you remember, so the   signal wave. So here, we basically get a sample  for each point in time. And now we want to plot   this. So we create a figure. So we say PLT dot  figure, and we give this a fixed size of 15   by five, then we say P L T dot plot, and we  want to plot the times against the signal   array, then we simply give it a title  P L T dot title, and let's call this   audio signal. Then I also want to say P L T dot  y label, and the y label is the SIG no wave,   and the P L T x label is the time time  in seconds. And then we say P L T x Lim.   And we limit this to be between zero and T audios  for five seconds. And then we say P L T dot   show. And this is all we need. And now if we run  this, then this should open our plot. And here we   have it. So here we have our audio signal plotted  as a WAV plot. And this is how easily we can do it   with matplotlib and the wave module. Now let's  learn how we can record with our microphone and   capture the microphone input in Python. So for  this, we use PI audio, a popular Python library.   And this provides bindings for port or your the  cross platform audio i o library. So with this, we   can easily play and record audio. And this works  on Linux, Windows and Mac. And for each platform,   there's a slightly different installation command  that they recommend. So for example, on Windows,   you should use this command. On Mac, you also  have to install port or your first so if you use   homebrew, then you can easily say brew install  port audio and then pip install PI audio. And   on Linux, you can use this command. So I already  did this. So here I'm on a Mac. So I use brew,   install port audio, and then pip install PI  audio. And now I can import this so I say   import pi audio. And I also want to import wav  to save the recording later. Then I want to   set up a few parameters. So I say frames per  buffer. And here I say 3200. So you can play   around with this a little bit. Then I specified  the format. So the format equals pi or your dot   P R int, sixth team. So this is basically the  same that we use here. So here we use NumPy, n 16.   And then here we have the P r and 16. Then I also  specify the number of channels. So here I say   one, so simply a mono format, and then also  the frame rate. So the rate here again,   I say 16,000. So again, you can use a different  rate and play around with this. Then we create   our PI audio object. So we say p equals pi audio.  Then we create a stream object. So we say stream   equals P dot open. And now we put in all the  parameters. So we say format equals format,   then I need the channels. So channels equals  channels, the rate equals the rate. We also   want to capture the input so input equals true.  And lastly, we save frames per buffer equals   frames per buffer. Then we have our stream  object. So now we can didn't start recording.   And now we want to record for a number of seconds.  So here I say five seconds. And then we store the   frames. And we store this in a list object. And  now we can iterate over and say, for i in range,   and we start at zero and go until, and now  we say rate divided by frames per buffer   times the seconds. And then we convert this to a  integer, not a float. And with this, we basically   record for five seconds. And then we read each  chunk. So we say, data equals, and then here we   say stream dot read. And then we read the frames  per buffer. And then we say, frames, dots, append   the data. So basically frames per buffer. So this  means we read this many frames in at one, so with   one iteration. And now we have it. So now we  can close everything again. So we can say stream   dot stop stream, then we can also say stream  dot close. And we can say P dot terminate.   So now we have everything correctly  shut down. And now we can, for example,   save the frames object again in a WAV file. So  for this, I say, object equals wav dot open.   And let's call this output dot wav wave, and in  write binary mode, then we set all the parameters.   So I said object set number of channels, this is  the channels parameter objects dot set sample with   this is the this we get from P dot get sample size  of our format, then object dot set frame rate,   this is the rate and then we can write all the  frames. So we say object, dot write frames.   And we need to write this in binaries. So we can  create a binary string like this, so a string,   and then dot join. And here we put in our frames  list. So this will combine all the elements in our   frames list into a binary string. And then we say  object dot close, and this is everything we need   to do. So now we can run Python, record Mike and  test this. Hi, I'm Patrick. This is a test 123.   And now it's done. So here we have our new  file. So let's play this and see if this works.   Hi, I'm Patrick. This is a test 123 And it worked  awesome. And now As last step I also want to show   you how to load mp3 files and not only wav files,  so let's do this. So to load mp3 files, we need   an additional third party library and I recommend  to use pie it up so this is a very simple to use   library it provides an simple and easy high level  interface to load and also to manipulate audio. So   in order to install this, we also need to install  FFmpeg so on the Mac, I use homebrew so I had to   say brew install FFmpeg and after this you can  simply say pip install and then pipe it up.   And now this should install it. So he has already  set up sfide. And now we can for example say from   pi up we want to import the audio segment.  And then we can say audio equals audio segment   and then here we can say from mp3 If we have  an mp3, in my case right now I only have a from   WAV and then here I am, let's load the Petrick  dot WAV and then we can for example also very   easily manipulate this by saying audio plus  six audio plus six. So this will increase   the volume by six d be 60 P then we can also for  example Repeat the clips. So we say audio equals   all your times to, then we can use a fade  in, for example, audio equals audio dot fade   underscore in with with 2000 milliseconds. So  two seconds fade in the same works with fade   out. So yeah, this is how we can manipulate.  And then we can say audio dot export. And then   I want to export this in. Let's call this mash  up dot mp3. And then I have to say format equals   as a string, mp3. And now for example, I could  load this by saying, or you two equals or your   dot from mp3. And then here I use mesh up  dot mp3 and then print done so that we see   it arrives at this part. And now let's  say python, and then the load mp3 file.   And yeah, this works. So now here we have our  mp3 file. And we could also load it like this.   So yeah, that's how you can use the pilot module  to load other file formats as well. And that's all   I wanted to show you. In this first part, I hope  you learned a little bit about audio processing,   and Python and charters. And now let's move on  and let's learn how to do speech recognition and   Python. Hey, and welcome. In this project, we  are going to learn how to do speech recognition   in Python, it's going to be very simple, what  we're going to do is to take the audio file   that we recorded in the previous projects,  and turn it into a text file. Let me show   you how the project works. So here is the audio  file that we recorded in the previous project.   Hi, I'm Patrick, this is a test  123. And if we run our script   we get the text transcription of  this audio file. So like this here.   Hi, I'm Patrick. This is a test 123. So let's  learn how to implement this in Python. So for   this project, we are mainly going to need  two things assembly API's API to do the   speech recognition and the request library  from Python to talk to assembly API's API.   So let's first go ahead and get a API token from  Assembly AI. It's very simple, you just need to go   to assembly ai.com and create a free account.  Once you have an account, you can sign in and   just copy your API keys by clicking here. And  right away, I'm going to create a configure file   and put my API key here.   Once I've done that, now I have a way of  authenticating who I am with SMD as API. And now   we can start setting up how to upload transcribe  and get the transcription from Assembly as API.   The next thing that I want to do is to have a  main file that is going to have all my code. What   I need to do in there is to import the requests  library so that I can talk to the assembly API.   So this project is going to have four steps.  The first one is to upload the file that we have   locally to assembly AI. Second one is to start the  transcription. Third one is keep pulling assembly   as API to see when the transcription is done.  And lastly, we're going to see this transcript.   So uploading is actually quite simple. If  we go to the documentation of assembly AI,   we will see here uploading local file files for  transcription. So I can just copy and paste this   and change the code as we need it. So basically,  yeah, okay, we are importing the request library   already. The file name we are going to get  from the terminal. So I will set that later.   Just a couple of things that we need to pay  attention here. Basically, there is a way to   read the audio file from our file system.  And then we need to set up a Heather's   these headers are used for authentication.  So we can actually already set this because   this is not going to be your API token. We  set it to be API key assembly API, right.   And we need to import it here of course.   Alright, that's done. So we also have  a upload endpoint for assembly and this   one is API that assembly comm V to upload. But  you know, this might be something that we need   also later, so I'm just going to put  this to a separate value variable   and then just call this one here. You So when  we're doing, when you're when you're uploading   a file to assembly AI, we are doing a post  request. And this post requests, you need to,   you need to send this post request to the upload  endpoint, you need to include your API key   included in the headers. And of course, you  need the data. So the file that you read,   and we are reading the data through the  read file function in chunks, because   some of the AI requires it to be in chunks. And  in chunk sizes of five megabytes. Basically,   this is a number of bytes that are in there. While  we're at it, we can already get the file name from   a terminal tool, right. So for that, I just need  to import system. And inside system, the second,   or the first, not the zeroeth variable, or  the argument is going to be the file name.   And here, let's clean up a little bit. All right,  now, we should be able to just run a command on   the terminal, include the name of the file that  we want to upload, and it will be uploaded to   assembly AI. And we will also let's print  the response that we get from Assembly AI   to see what kind of response we get. Again,  this is the file that we are working with.   Hi, I'm Patrick, this is a test 123. And what  we need to do right now is one Python main.py   and the name of the file in  this case, output dot love.   All right, so we uploaded our file to assembly  AI successfully. In the response, what we get is   the upload URL, so where your data where your  audio file lives right now. And using this,   we can start the transcription. So for the  transcription, let's again cheat by getting   the code from the docs. Here is the data  the code that we need, starting from here.   So this is a transcription endpoint, you can see  that it ends differently than the upload endpoint.   This one ends with upload this one ends with  transcript, I will call this the transcript   endpoint. Heather's we already have a header, we  don't really need this anymore. The endpoint is   transcript endpoint. JSON is the data that we are  sending to or the data that we want somebody AI to   transcribe. So we are going to need to give it the  order URL, we already have the order URL, right.   So we got the response. But we did not extract it  from the response. So let's do that. Oh, do URL   is response, JSON. And it was cold, upload girl.   So we're going to give us audio euro to hear  because that was just an example. Okay. And this   way, we will have started the transcription.  And let's do this and see what the result is.   I will run this again. Same thing.   Alright, so we got a much longer response.  In this response, what we have, we have a   bunch of information about the transcription  that we've just started. So you do not get the   transcript itself immediately. Because  depending on the length of your audio,   it might take a minute or two, right? So what  we get instead is the ID of this transcription   job. So by using this ID, from now on, we can  ask somebody AI Hey, here is the ID of my job,   this transcription job that I submitted to  you, is it ready or not. And if it's not ready,   it will tell you it's not ready yet. It's still  processing. If it's ready, it will tell you hey,   it's completed. And here is your transcript. So  that's why the next thing that we want to build   is the polling we're going to keep we're going to  write the code that will keep polling assembly AI   to tell us if the transcription is ready or  not. But before we go further, let me first   clean up this code a little bit so that you know  everything is nicely packed in functions, we can   use them pre use them again if we need  to. So this one is the upload function.   Yes, and what it needs to return is the audio  URL. We do not need to print the response anymore.   We've already seen what it looks like. And we  need to put the header separately because we want   both upload and transcribe and basically  everything else will be able to reach   this variable called Heather's for transcription  again, I will create a function called transcribe   and what I need to return from the transcription  function is the ID. So I will just say job ID   and that would be response dot JSON and ad again  don't need this anymore. I'll just call this   transcript response to make it clear this will  be upload response. Let's call this transcript   request. So everything is nice and clean. This is  this and this goes here. And for upload response,   we use it here. And we need to return job ID.  Alright, so now we have them nicely wrapped up   in different functions, and everything else looks  good. Let's run this again to see that it works.   Now, of course, I'm not calling the function.  So let me call the functions and then run it,   upload and transcribe. But of course, I also need  to pass the file name to the upload function. So   let's do that too. Do your URL is not defined, or  the URL of course, then I also need to pass audio   URL, audio URL to transcribe Good thing we tried.  So this will be returned from the upload function,   and then we will pass it to the transcript  function. And as a result, we will get job ID.   And then I can print job ID to see  that it worked. Let's see. Yeah,   yes, I do get a job ID. So  okay, things are working.   The next thing that we want to do is  to set up the fault polling function.   So the first thing we need to do for  that is to create a polling endpoint   polling endpoint. So as you know, we had the  transcript endpoint and the upload endpoint here.   That's how we communicate with somebody as API  with polling endpoint is going to be specific to   the transcription job that you just submitted. So  to create that, all you need to do is to combine   transcript endpoint with a slash in between, and  add the job ID but the job ID is a bit weak. So   I'll just going to call this transcript ID. So  by doing that, now, you have a URL that you can   ask to assembly AI with which you can ask assembly  AI if your job is done already or not. And again,   we're going to send a request to assembly AI,  this time is going to be a get request. Well,   I'll just copy this so that it's easy. Instead of  post is going to be a get request, we're going to   use a polling endpoint instead of the transcript  endpoint. And we just need the Heather's for   this we do not because we are not sending any  information to assembly AI. We're just asking for   information. If you're familiar with requests,  normally, this might be very simple for you.   But all you need to know about this is that when  you're sending data to an API, you use the post   request type. And if you're only getting some  information, as the name suggests, you use the   get request type. So the results resulting or  the response that we get is going to be called   polling response. Let's see it's not job ID,  I called transcript ID so that it works. Then   we get the polling response. And I can also  show you what the polling response looks like.   Looks good. Okay, let's run this. Alright,  so we got response 200. That means things   are going well, but actually, what I need  is a JSON response. So let's see that again.   Yes, this is more like it. So again, we get the  idea of the response language model that is being   used and some other bunch of information. But what  we need here is the status. So let's see where   that is. Oh, yeah, there it is. So we have status  processing. This means that the transcription is   still being is still being prepared, so we need to  wait a little bit more and we need to ask assembly   AI again soon to see if the transcription is done  or not. What we normally do is to wait 30 seconds   or maybe 60 seconds depending on the length of  your transcription with length of your audio file.   And then when it's done, it will give us status  completed. So let's write the bits where we ask   assembly AI repetitively if the transcription  is done or not. So for that, we can just   create a very simple while loop  while true. We do the polling   and if poling response dot JSON status equals to  complete it. We return the polling response. But   if polling response, status is error, because  it is possible that it might air out, then we   will return. There. I'll just wrap this into  a function, I can call this gets transcription   results URL. And while we're at it, we might  as well so wrap the polling into a function,   do we need to pass anything to  it? Yes, the transcript A D,   need to pass a transcript ID to it. And instead  of printing the response, we will just return the   response. So instead of doing the request here,  all we would need to do is to call this function   with the transcript ID, we can pass the  transcript ID here or might as well I   will just call the transcription  or transcribe function in here.   And the resulting thing would be the transcript  ID from the transcription function. And then I'm   going to pass this transcript ID to the polling  function that is going to return to me the polling   response. I will call this polling response  data. And inside this data, so this is not needed   anymore. Yeah, so the polling response that JSON  is what is being passed, I call that the data.   So I change this to data here. And also data  here. Yeah, then I'll just pass the data.   If it's error, I can still pass the data  just to see the response and what kind of   what kind of error that we got and hear  them and just just say none. All right,   let's let's do a little cleanup. So we have a  nice upload function, a transcript function,   what we did before was we were calling the upload  function, getting the audio URL and then passing   it to transcribe, but I'm running transcribe here.  So I do not need this anymore. I still need to   pass the order, you're all to transcribe. So then  I would need to pass it to here. So instead of   this, just need to call this function with  the audio y'all. Yeah, let's put these here.   Actually, to make it a bit more understandable,  maybe instead of passing the string error,   I can just pass whatever error  that was that happened in my   transcription, then you know, we'll  be able to see what went wrong.   Alright, so what we get as a result from get  transcription result ID is the data. And if   there is any error, so then let's Why not run  this and see what the data is going to look like.   All right, so we get something  really, really big. Let's see,   maybe I'll just clear this and run it again, just  so that you know, we can see it more clearly.   Alright, so we get the ID, again, language  model that is being used, etc. Now we want   the results. Yes, it is under Text. Hi, I'm  Patrick. This is a test 123 is what we get.   And we also get the breakdown of words, when each  word started meant each word ended in milliseconds   confidence of this classification, and much  more information. What we want to do though,   even though we have all this information,  we want to write this transcript that is   generated by assembly AI into a text file. So in  this next step, that's what we're going to do.   Alright, let's come up with a file name for  this file. We can call it actually we can   just call it the same thing as a file name plus  txt. So the file name okay, we were using the   argument or variable file name too.  So maybe let's find something else.   We'll just call this txt file name. And it will  be the file name plus dot txt. We can also just   you know remove the dot valve or dot mp4 or  whatever but let's not deal with that for now.   So once I have this I will just open it   I will open it in Writing format.  And inside I will write data.   Texts because that's where we have the text  information on the transcript. If you remember   here, this was a response we got  and text includes the transcription.   And I can just prompt the user saying that  transcription is saved, transcriptions saved,   are happy. Of course, there is a possibility  that our transcription errors out. So you want   to cover that too. If you remember, we returned  data and error, what we can do is you can say if   data is returned, this happens. But if it  errored out, I will just print error. No,   it didn't work out and the error itself so that  we see you know what went wrong. Okay, let's do   a little cleanup. Again, I want to wrap this all  up in a function, we can call the Save transcript.   Data and error will be returned from get  transcripts URL, it means the audio URL, so I will   just need to pass over your URL here. And with  that, we're actually more or less ready. So let's   run this and see if we get what we need the  transcript saved in a file. For that after   the after calling the upload function, I can move  this one here and calling the upload function here   called the upload function. And then  I call the Save transcript function.   And let's quickly follow that up. I call this a  transcript function. It calls get transcription   result URL, get transcription result, URL  calls, transcribe, transcribe is here.   It starts with transcription process and then  get transcription result URL also calls polling.   So it keeps pulling assembly AI. And when it's  done, it returns something and then we deal   with it in the Save transcript function, and we  either save a transcript or if there is an error,   we display the error. So let's run  this and see if we get any errors.   Transcription saved. Alright, let's see.  Output was that txt. If I open it up,   it looks quite small. Maybe I  can, if I open it like this.   Yes. Hi, I'm Patrick. This is a test 123 is the  result that we're getting. So that's awesome, we   actually achieved what we wanted to do. So in this  next couple of minutes, I actually want to clean   up the code once again, because you're going to  build a couple more projects. And we want to have   a python file that has some reusable code, so we  don't have to reinvent the wheel all the time. So   let me first go here, actually, when we're doing  the polling, if we just have a while true loop,   it's going to keep asking assembly AI for results.  And you know that that might be unnecessary. So   what we can do is to include some waiting times in  between, so it can ask if it's not completed yet.   It Can Wait, let's say 30 seconds to ask again.  So we can inform the user waiting 30 seconds.   What I need is a time module. So let's  close 30. And I will just import time here.   And this way, it should be waiting  30 seconds in between asking assembly   AI if the transcript is ready or not.  And okay, let's create that extra file   that we have API communication, I'll call it.  Yes. So I will move all of the functions that   communicate with the API there. So  I need to move the upload function.   I need to move transcribe  poll all of these actually.   So just remember that Yeah. Let's  see. Did we miss anything though,   I'll just remove these from here. File Name  can stay here, of course, headers and the   upload and transcript endpoints need to live  here because they are needed by the functions.   In here, we have to import the requests library.  So we don't need it anymore here. We need to   import the assembly AI API key. system  needs to stay here time needs to go there.   And we also need to import from API communication.  Import, we'll just say all. And that way we can   use these functions in our main python script.  I will run this again to make sure that it is   still working. So I will delete the text file  that was created, I will keep the output.   Nice. So we also get the prompt that the program  is waiting 30 seconds before asking again.   Oh, yeah, we passed the filename. But of course,  it might not exist there. So let's go and fix that   the file name is here, we only pass it to the  upload function and the upload function is here   now. And in the Save transcript, we do not pass  it, but we're actually using it. So what we can   do is to just also pass the file name here. And  that should be fine. It should fix the problem.   Transcription saves. Alright, let's see output  one. txt Hi. Oh, like this. Hi, I'm Patrick.   This is a test 123. So this is a very short audio  file. And we actually been using it over and over   again. So I want to also show that this code  is working to you using another audio file.   This is the audio of the one of the latest  short videos that I made for our YouTube   channel. I was just talking about  what natural language processing is.   So this time, maybe if I add  underscores, it will be easier   to call. Yes, I will just copy its name. And  when I'm calling the script, I will use its name.   This will probably take a little bit longer  because the audio file has been using is only a   couple of seconds. And this one is one minute. So  we will see what the results are going to show us.   Right here we go the transcription is saved  we find here. Right, this is exactly what I   was talking about. Let's listen to it while the  transcription is open. Kind of like severe, Best   Funny. Well, not now. But probably very soon. We  haven't seen gigantic leaps over the last couple   of years in terms of how computers can understand  and use natural language. Alright, you get the   idea. So our code works. This is amazing. I hope  you've been able to follow along. If you want to   have the code. Don't forget that you can go get  it and the GitHub repository we prepared for you   using the link in the description. Welcome back to  the third project. So in this one, I teach you how   to apply sentiment analysis to YouTube videos. So  you will learn how to use the YouTube DL package   to automatically download YouTube videos are only  extract the information you need. And then I also   teach you how to apply sentiment analysis. So  in this example, I use iPhone 30 review videos.   And the result that we get looks like this. So for  each sentence in the video, we get the text. And   then we also get the sentiment. So this could be  positive or negative or neutral. For example, if   we read this text here, the new iPhone display is  brighter than before the battery life is longer,   and so on, and the sentiment is positive. And  here the text is still there are some flaws,   and now the sentiment is negative. So this works  pretty well. And this can be applied to so many   use cases. So let's get started and see how to do  this. So here I've created a new project folder.   And we already have our API secrets and the API  py file with the helper functions to work with   the assembly API API. And now let's create two  more files. So the main.py file that will combine   everything and the YouTube extractor files, this  is another helper file to extract the infos from   the YouTube video. And for this, we are going to  use the YouTube DL package. This is a very popular   command line program to download videos from  YouTube and other sites. And we can use this as   command line program. But we can also use this in  Python. So for this we say pip install YouTube DL,   and then we can import this so we say import  YouTube DL. And then we set up an instance so   we say why the L equals YouTube DL dot YouTube DL.  And now I'm going to show you how you can download   a video file and also how you can extract the  infos from a video. So let's create a helper   function that I call get video infos and this  takes and URL And now we use the YT l object   as a context manager. So we say with y the L,  then we say we're CELT equals y, the L dot,   extract info. And this gets the URL. And  by default, it has download equals true.   So this would also download the file. But  in my case, I say download equals false,   because of course, we could download the file  and then upload it to assembly AI. But we can   actually skip this step and just extract the  URL of the hosted file. And then we can pass   this to the transcribe endpoint in assembly AI.  So we can set download equals false here, then   we do one more check. So we say if entries, if the  entries key is in the result, then this means we   have a playlist URL here. And then we want to  return only the first video of this playlist.   So we say return results with the key entries, and  then the result zero or entry zero. And otherwise,   we return the results simply so this is the whole  video info object. And then let's create another   helper file that I call get all your URL, and  this gets the video infos. And first of all,   let's simply print all the video infos to see how  this looks like. So now let's say if underscore   name equals equals main. And then let's first  extract the video info. So video info equals get   video infos this needs in your L. And then we  say all your your L equals get all your your URL,   and then we want to print the audio URL. So right  now this is none because we don't return anything.   So let's get an example URL. So for this, I  went to YouTube and search for iPhone 13 review,   and I choose this video. So I've entered into  review pros and cons. So we can click on this.   And then we have to watch an ad. But we  can actually copy this URL right away,   and then put it in here as a string. And  now if we run this, then so we run Python,   YouTube extractor dot time, then it should print  the whole URL. So yeah, actually, here, I have to   pass this YouTube Info. And let's try this again.  And yeah, so here, it extracted the whole or it   printed the whole info. So this is actually a  very long object, a very long dictionary. So   I can tell you that this has a key in it, that is  called formats. So let's actually print only the   formats. And if we run this, then this is also  still a very large, very large dictionary.   But then again, this is an inner dictionary,  and this has a key that is called. Or actually   this is a list. So now we can iterate over this.  So here we say, for F in video in four formats.   And then we can print the F let's print F  dot, and it has the key ext for extension.   And it also has a your L so we also want to  print F dot u r l and now if we run this down,   let's see what happens. Um, let's actually  comment out the URL because this is super long. So   let's print only the extension. And now we see  we have a lot of different extensions because it   actually start the video in a lot of different  formats and with a lot of different resolutions   and so on. So, what we want is this one, so the  M for a this is a audio format ending. So we now   check if the format or if the extension  equals equals m for a, then we return the   F URL key. So this is the audio,  your L. And if we save this,   and then print this at the very end, then we  should get the URL to this host that file.   So you can see this is at this URL. So this is not  related to youtube.com. So now let's, for example,   click on this. And then we have this in our  browser, so we could listen to the audio file.   So yeah, this is the first part how to work with  the YouTube DL package to extract the infos.   And now let's combine this in the main.py. So in  main.py, we combine the YouTube extractor infos,   with assembly AI and extract the transcript of  the video and also the sentiment classification   results. So sentiment classification is usually  a pretty difficult task. But assembly AI makes   it super simple to apply this. So if we go to  the website, assembly itad, calm and have a look   at the features, then we see they provide core  transcription. So this is basically the speech   recognition we've seen in the last part. But  they also offer audio intelligence features,   and they are pretty cool. So there are a lot  of features you can use, for example, detect   important phrases and words, topic detection,  auto chapters, so auto summaries and much more.   And if we scroll down, and here we find sentiment  analysis. So if we click on this, then we see a   short description. So with sentiment analysis,  assembly, AI can detect the sentiment of each   sentence of speech spoken in your audio files.  Sentiment Analysis returns a result of positive,   negative or neutral for each sentence in  the transcript. So this is exactly what we   need here. And it's actually super simple to  use this. So the only thing we have to change   is when we call the transcript endpoint, we also  have to send sentiment analysis equals true as   JSON data. So this is all we need to do.  So let's go to our code and implement this.   So let's import all the files we need. So we want  chasen and we say from YouTube extractor we import   get all your your URL and get video infos.  And from our API helper file, we import save   transcripts. Then here, I create one helper  function that I call Safe video sentiments. And   this gets the URL. And here we get the video in  force by calling Get Video enforce with the URL,   then we get the odd your your URL by calling  get all your your URL and this gets the video   infos. And then I simply call the safe transcript  function and this gets the audio URL, and it also   gets a title. And for the title I want to use  the title of the video so we can get this from   the video info. So this has a key that is called  title. And then I want to slightly modify this so   I say title equals title dot strip. So I want  to remove all leading and trailing whitespace.   And then I want to replace all spaces with a  underscore and then I also say title equals data   slash plus title. So I want to store this in  a separate folder. So here we create this and   call this data and now we have to modify the  slightly so if we have a look back, then we see   this needs the additional arguments  sentiment analysis. And now so in the   safe transcript file, I will put this as  additional argument and I will give this a default   of false and then here we say sentiment analysis  equals true and now we have to pass this through.   So we have to pass this to the get transcription  result your l so this also needs this parameter,   then the transcribe needs the parameter. And here  this needs the parameter. And now as a JSON data   that we sent, we put sentiment analysis equals  true or false. And this is all that we need. And   now of course, I also want to save this.  So here we check if the parameter is true,   then I create a separate file. So again, I say  File Name equals title plus, and then let's call   this underscore center, man's dot JSON. And now I  say with, with open the file name in write mode,   s, f, and then I import JSON in the top import  JSON. And then here, we simply say JSON dot dump.   And first we have to extract the infos, of course,  so we call this sentiments equals data. And then   the key. If we have a look at the documentation,  then here we see the chase and response now has   this additional key sentiment analysis results. So  we use this and then we dump the sentiments into   the file. And I also want to say indent equals  four, to make this a little bit more readable. And   now in the main.py, we call this function and see  if underscore name equals equals underscore main.   And then I want to call the safe video sentiments.  And the URL is this one. So let's copy and paste   this in here. And now let's run the main  the py file and hope that everything works.   So the website is downloaded and transcripts from  start. So this looks good. So let's wait. Alright,   so this was successful, and the transcript was  saved. And now we have a look at the data folder,   then here, we get the transcript of the video. And  we also see our JSON file with all the sentiments.   So for each sentiment, we get the text of  the sentence. So for example, this one,   with the exception of a smaller notch, the  iPhone 13 Doesn't seem very new at first glance,   but when you start using this flagship, you  start to appreciate a bunch of welcome upgrades,   then we get the start and end time, then  we get the sentiment which is positive.   And we also get the confidence, which is pretty  high. Then the next example, the new iPhone   display is brighter than before the battery life  is longer, and Apple has improved, blah, blah,   blah. So here also the sentiment is positive,  then we have still there are some flaws here.   And now the sentiment is negative. So this works  pretty well. And yeah, this is how you can apply   sentiment analysis with assembly API. Now I want  to show you a little bit more code how we could   analyze this, for example. So now we can comment  this out. So we don't need to download this again.   Then we can read our JSON file. And here we  store the positives, negatives and neutrals.   So we iterate over the data. And then  we extract the text, so the text,   and we also extract the sentiment, so then we  check if this was positive, negative or neutral,   and appended to the corresponding list, then we  can calculate the length of each list. And then   we can print the number of positives, negatives  and neutrals. And we can also for example,   calculate the positive ratio. So here, we ignore  the neutrals and simply do the number of positives   divided by the number of positives plus the number  of negatives. And now if we save this and run this   again, then here we get the number of  positives. So 38, only four negatives   over all positive ratio is 90%. So with this,  you can get a pretty quick overview of a review,   for example. And yeah, I think the sentiment  classification feature can be applied to so   many different use cases. It's so cool. So yeah,  I hope you really enjoyed this project. And now   what would be really cool is if we could display  these information in a nice looking web app,   and this is actually one thing that you will  learn in the next tutorial together with Misra   so let's move on to the next project. All right  Now it's time to build a podcast summarization   app. And we're also going to build a web  interface for this application. In this project,   we are again going to use assembly API's API that  offers the chapter isation summarization features,   and we are going to get the podcast from the  Listen notes API. So let's get into it. Here's   what our app is going to look like once we are  done with it. So we will get a episode ID from   listen notes API, I will show you how to do that.  And when we click this button, it will give us   first the title of the podcast and an image, the  name of the episode. And then we will be able to   see different chapters and when they start in  this episode, and if we click these expanders,   we will be able to read a summary of the chapter  of this episode. This is all quite exciting to   start building a front end for our application to  so let's start building it. So in this project,   like in the previous ones, we are going to have a  main script. And we are going to have a supporting   Script API communication, where we have all  of our supporting functions that we want to   use over and over again, we built this before. So  this is the exact same one from the third project   the project that we did before. And we will only  need to update this and change some things to   start doing podcast summarization. So the first  thing that I want to update here is that we will   not actually need the upload endpoint anymore. So  I'm just going to go ahead and delete that one,   because the transcripts are going to be sorry,  the podcasts are going to be received from the   Listen notes API. So it's going to be somewhere  on the internet, we will not download them to   our own computer. So we can immediately tell  assembly AI hey, here's the audio file here is the   address of the audio file that I want you  to transcribe. And it will be able to do   that. So there will be no download or upload  needed. That's why I also don't need the upload   function. Also the chunk size not relevant  anymore. All right. So that's good for now.   And the next thing that we want to do is to set  up the Listen notes API communication. So we are   going to use assembly AI to create the summaries  of the podcasts. And we will get these podcasts   from listen notes. If you've never heard of it  before. Listen, nose is basically a database of   podcasts, I think nearly all of the podcasts so  you can search for any podcasts. For example,   one of my favorites is 99% invisible,  and you will be able to get all of its   information plus the episodes so you can search  for episodes here if you'd like to. What we're   going to do with listeners is that we are going to  send it a episode Id like specific episode ID that   we will find on the platform itself. So let's say  I want to get the latest episode of 99% invisible.   If I go to the episode page, and go  down to use API to fetch this episode,   I will see a ID. So this is the ID of the specific  ID of this episode. And using this ID, I will be   able to get this episode and send it to assembly  AI. And this is exactly the ID that we need   on our application. So to get that first, of  course, we need the Listen notes endpoints. Listen   node has a bunch of different endpoints. But  the one that we need is the episode endpoint to   get the episode information. So I will just name  this listen notes, episode, and point and it is   this one. And of course, we also need the header  again to authenticate ourselves and in the header,   we're going to need to put a API key so  all you have to do is go to listen nose,   create an account and get an API key for yourself.  And we are going to go and paste it here.   And here as you know, we are importing the API  key for assembly yeah, now I'm also going to   import the API key for listen notes and we are  going to send it with our requests to listen   notes. So I will call this the Listen notes.  Heather's and this the assembly AI Heather's   for listen knows this is named  x listen API key. Alright,   the first thing that I want to do is to build a  new function that is able to get the episode ID   and give us the URL to the podcast is audio file.  So I will call this one get episode, audio, Euro.   And it is going to get an episode ID and we're  going to send a GET request to listen notes.   Let's build a URL first. The URL is going to  consist of the Listen notes episode endpoint. and   dash plus the episode ID. And we  are going to send a GET request   to zero, I will call the response we get is  response for now. And the last thing that we need,   of course, is the headers for authentication.  And that one is called listen notes headers.   So as ever do this, we should be able to get  a URL for the episode ID. And the information   is going to be sent to us in a JSON format. So  this way, we will be able to see it. So maybe,   let's try this at first and see that it works.  So to do that, I am just going to again, import   from API communications, import everything, I'll  just make this a simple Python script for now.   And I'm going to call get episode audio Euro.  And I will use the episode ID that I found here.   This one to keep things simple.  And as a result, we will   print the response that we get from listen  notes. So let's run this and see what happens.   All right, this is really long. So  maybe I'll be I will use a pretty print   to make it more readable. So this print here.  And instead of this, just use pretty print.   Okay, let's do it again.   All right, that is slightly better. Let's see  what kind of information we are working with.   Nice, we get the audio URL here. This is the  URL of the audio. Let's see where that takes us.   Yeah, this is just the audio of  this podcast, you can hear it   Aerocity that the Roman advance was halted. Nice.  Alright, so this is exactly what we need. But if   you want, you can also get some extra information  about the podcast. If you want to display it   in some way, we will definitely use the plus  blade. And this is a description of the episode,   whether there is explicit content or not the image  of this episode, and some extra information about   the podcast like Facebook and Google handle, etc.  So you get a lot of information. So if you want   to make your web application at your  interface even more interesting,   more interactive, you can of course include more  of this in your application. So if we just return   data audio from here, we will actually  just return to order here all but you know,   now that we have all this information  might as well extract some more of it.   So some of the things that we can get as a  thumbnail of this episode, name of the podcast   and title of this episode for example, like  we said, we will display here so let's do that   this will be the audio Euro we  will also get the episode thumbnail   thumbnail   we can get the podcast title that would  be in podcasts. And then this podcast   specific information and then we get  the title. And lastly episode title.   thing it is just title. And we can just  pass all of this information back episode,   thumbnail episode, title, and podcast title.  So we don't really need to change much from   the rest of the functions for example, transcribe  poll get transcription results that we already   built beforehand. The only thing that we need  to change is now we're not going to do sentiment   analysis we want to do use auto chapters features  of assembly AI. So I'm just going to rename these   to all the chapters. This is just the name of a  variable so it is not that important. You can keep   it the same. But for readability it's probably  better to change it to other chapters but here   in this variable we need to change this name  to other chapters because we're sending this   request to assembly AI and it needs to know  that we want other chapters what else we also   just updated the name of the heather so it's not  only Heather's now it's assembly AI Heather's   same here in the polling. We do not need to change  anything we are only asking you if a transcription   is done or not, again, get transcription  result URL we want to change it to or chapters.   One other thing that I want to change is it's  very small. But normally we were waiting for   30 seconds. But now I want to wait for 60 seconds  because podcast episodes tend to be a little bit   longer. So we want to wait a little bit longer in  between asking assembly AI if the transcription is   ready or not. This is another change. But the main  work is going to happen in the Save transcript   function. So the main change we're going to need  to do in save transcript function is that before   we were uploading our audio to assembly AI, and  then we were getting the result back, but instead   this time, we are going to only have a episode  ID and then we are going to get the URL from   listen notes. And then we're going to pass that to  assembly AI to start your transcription. So what   I want to do here is to instead of URL and title,  I will just give say transcript, the episode ID.   And then I will run the get episode audio URL  from oops, from inside the Save transcript.   And as a result, what we're getting is  order your roll episode thumbnail episode   title and podcast title. Again, we are not doing  sentiment analysis, we are doing order chapters.   And we need to pass the order y'all to get  transcription results URL, get transcription   result URL gets the order your URL as URL and  all the chapters but it is not defined. So   you know this is what we want to  do is hoses coal, it is true here.   The next thing that we want to do is to deal with  the response that we get from Assembly AI. So   let's first see what the response from Assembly  AI looks like when we were doing auto chapters.   And then let's deal with it. But  let's fix some of the problems here.   So I will not save it into a file for now. I can  comment these out. This will be order chapters.   The main thing that I want to do is see what the  result looks like. Right? So I will pretty print   the data. And the data is already in  JSON format. Transcribe Yes, it is.   Yeah, so I will just show that. So I'm  just going to comment these out for now,   just so that you know, we have an idea of what  what the response looks like. To run this,   I will just pass the episode  ID to save transcript.   Oh, we're still printing this one.  So I will actually stop printing the   response from listen notes on what started again.   Alright, so we got the results.  Let's see what it looks like.   It's a lot of information. Let's scroll to the  top. And what we wanted was a chapter is basically   so let's see what the chapter information  includes. So as you can see, this is one chapter.   And this is another chapter. So for each chapter,  we have the starting point. And then we have the   ending point, the gist of the chapter. So  really quickly, what is this chapter about?   We have a headline for this chapter and a  summary. So in a couple of sentences, what   is happening in this chapter, what what is the  presenter talking about? What we want to do is to   show this information on our application right  on our web interface. So that's why what we want   right now is to extract this information  from the response we get from Assembly AI   and then save it somewhere and then we can  visualize it on our stream that application.   So I will undo the commenting here. Also here.   So I will call this file with the episode ID it  will be episode ID that t x t. And as we always   do, I'm just going to say the transcript you know,  we don't have to touch this so much, but I will   start another file. And let's call this chapters  file name and This one will be episode, id   plus, maybe let's call like chapters,  that. txt. All right, so chapters   will be another file. So I'm going to keep all the  chapter information somewhere else. And in here,   I'm going to write some of the information I  got from Assembly AI, specifically the chapter   information. And I'm also going to include some of  the information I got from the lesson notes API.   One mistake here, I do not want it to be  a text file, I want it to be a JSON file,   so that it will be easier to parse easier to read  later. For me, the first thing that I want is the   chapters. And I'm going to get that from the data  variable. It's called chapter so let's check.   The section is called chapters. Yeah. So  let's start it out. We'll say episode data.   At first, let's include the chapters  again, I will call the chapters.   And then inside this episode data, what  do I want, I want the episode thumbnail.   I want the episode title.  And I want the podcast title.   So that I have all of this information in one  place saved on my file system, I can just read   it whenever I want and display to the user.  And finally dump that to the file sewed data.   And I'll let the user know that the transcript  is saved. This part we don't need any more.   And again, if there is an error, we will just say  that there is an error. And we will return true.   Now that we got this far already, up till  now what we do is get the URL from based   on the episode ID from listen notes, and  then send it to this URL to some of the AI   gets audio chapters information, and then save  it to a file. So let's see that this works well.   And while it's running, we will start the stream  with application. So I will just run this again.   But in the main view, of course need to call  save transcript. Okay, we're open to doing it. So   I will just run the application. And let's also  start building our assumed application now. So   if you've never heard of streamlet before, it is a  really easy way to start building web interfaces.   For your application specifically for Python, it's  very simple to use it is has a very simple API,   it's a very simple library. So what you have to do  is you call your import streamlet as a see if you   wanted to use it simply. And let's say if you want  to, you know put a title in your application, all   you need to do is SD title and then you can show  that it as a title. So I will run this separately.   To show you how it works.   And to run from it applications, you just need  to say assume it run mean that a PI stream it   is installed on your computer like any other  Python library, so you just need to use PIP   say pip install a streamer then you will be  good to go. Unless you make a mistake and call   stream with a capital S which is not the case it  needs to be a lowercase s so let's do that again.   Alright, so this is actually an application it  the only thing we're showing right now is a title.   And we know what you want it to look like is  that so I will start building the elements   in this application. So the first thing that  you know strikes us is that we have a sidebar,   we have a title that says podcast summaries and  then we start showing the information from the   information we got from the  API's that we've been using. So   let's put a sidebar maybe let's let's fix the  title. First we want to say podcast summaries.   title says podcast summaries or  it can even say welcome to our   to my application that creates podcast summaries.  Let's see maybe that won't be too long but we'll   see. And let's create the sidebar is quite  simple. You call streamlet sidebar dot texts   input Yeah. And then you know you can  say please input Here's a, an episode ID.   And I can also have a button at the  end of the sidebar that says, Get   podcast summary, maybe with a exclamation  point too. So let's run it again.   Okay, this is looking more like it, it says  Welcome to my application that creates podcast   summaries. I can put an episode ID here.  And then I can say get podcast summary. So   you see that it is running, it is running because  I forgot to comment out this one. So it's actually   running the whole application. I'll just stop  it for now, because we don't have any way of   displaying whatever we get back from the API's.  So I'll stop this now. And now that we have the   application looking more or less like what we  want it to look like, let's wait for the chapter   results to be printed on our file. And then we  will see what it looks like. And then we can start   parsing it and then showing it to the user on our  streaming application. Okay, so the transcription   is saved to our auto chapter creation is  done. Let's take a look at what it looks like.   We have the chapter section, we have the episode  thumbnail episode, title and podcast title.   Or good in the chapters we have chapter numbers.  And inside each chapter, we have the summary   headline just start and end. So it looks good.  Let's start showing this. The first thing that   I want to show of course, like we did in the  beginning, like we showed in the beginning   is the name of the episode, or maybe the name  of the podcast plus the name of the episode,   and then the episode a thumbnail. So how I'm  going to show that is again using streamlet.   And that is going to be the header for  me. And I will include the podcast title.   Maybe with a dash in between and the episode  title. But as you can see, we do not have it yet.   So first, we need to open the file that includes  these things. And the file that includes those   things is the episode ID that underscore chapters  at Jason. So started again, file name would be   episode D underscore chapters, the JSON and where  do I get the episode ID, I get the episode ID from   the text input. So the user is going to input the  episode ID and then I am going to save it here   in this variable. And that way I will have the  file name. So then I just need to open this file   and let's call it data. For example,  I need to import JSON of course   and loaded into the variable data.   So in this variable data, what do we have? We  have the chapter so first, let's get the chapters,   data chapters. And then what we want to get is  the podcast title, and then the episode title.   Let's change the names episode title. And we  also want some nail. And what did we call the   thumbnail? We can see here, Episode thumbnail.  Alright, so thumbnail. So we're already showing   the podcast title and episode title streaming  header. And then we can show the image thumbnail   with the streamlet image function. And from this  point on the next thing that we want to show   is the chapters of course, one thing we can do is  for example, we can use a for loop could say for   chap in chapters. You know, you  can just say stream it right   or just show the chap but that's one way of doing  it. But you're going to have a lot of texts one   after another. It's not really nice. What we want  is like in the original one I showed you at the   beginning, we want to expanders, so it's quite  easy to create expanded restreaming again you   just say stream it expander and then you want your  right what kind of information you want to be in   your expander so as the title of the expander  I will write here what I want in title and   whatever I want inside the expander I'm I'm  going to write inside. So I do not need to   use a stream. Let's think again, because  this is going to be inside the expander.   And inside the expander, what I want is the  summary. So I think it was called summary.   Let's just check again here in our JSON file. In  chapters, we have summary. It's called summary.   Yes, so I want the summary to be in there. And as  a title of the expander, I want there to be the   gist of each chapter. So for each chapter is  going to show me the expanders for each chapter,   there will be expanders, and the title of the  expander will be the gist of this chapter. And   inside the expander, we're going to have the  summary of this chapter. So let's run this   and see how it looks. But let's do this. First,  make sure that everything works. So I have the   title. And then I asked for a episode ID from the  user, there is a button that starts this process.   And for that to happen, I'll just call  this button. So we this information, this   button variable has information of whether this  button has been pressed or not. And I only want   this part, this part to happen this visualization  to display part to happen if the button has   been pressed. So I'm going to wrap this all in a  condition. So otherwise, it's not going to happen.   Yes, but right now, if someone presses the button,  nothing really happened. So we also need to add an   action to this button. And how we're going  to do that is we're going to say on click,   if this button is clicked, what we want to happen  is the same transcript file to be run. So I'm   going to call it here in the onclick argument. And  we also have arguments, right. And here is how you   pass arguments to your function that you call  from your button. This is a tupple. That's why   you write the variable or the argument that you're  passing to the function and the first one, and   the second one is empty. Now, when the button is  clicked, this one should run and we should be able   to see all the information on our application.  So let's run it again and see what happens.   Yeah, we need to run the streamlined application  this time, streamline run main that py.   I'll close the old one. So we know  the difference and which one is which.   This is just the example  from the beginning. Alright.   So we want to get a podcast, we want to  display it. I will get this one again.   Let's get the podcast summary. And here it is.  We have the title Welcome to my application that   creates podcast summaries. Okay, maybe that's a  bit too long, I will shorten it. The name of the   podcast name of the episode number of the episode  also the missing middle. And here are my chapters.   So apparently there are 1234567 chapters assembly  as API was able to find an E chapter we have the   gist of the chapter as the title of the expander.  And the chapter summary here valon. One last thing   that I want to add is a start and end point of the  just the start point of the chapter here, because   I want to show like how long each chapter is  maybe. So let's do that. So for that I want to see   in this JSON file, how it looks. So The start  looks like this. So these numbers might look   a bit random to you. But basically, they are  milliseconds. So I am wanting to turn it into   minutes and seconds. And if applicable  hours, minutes and seconds. And there is   already a function that can do that. Here it is,  we don't need to, you know, work on for a long   time. Basically, you get the milliseconds and when  you get the milliseconds you can get the seconds   out of it, how many seconds there are, how  many minutes there are and how many hours   are so basically you're counting the hours and  everything that is on top of the hour is mentioned   as a minute if it does not add up to an hour, and  everything that does not add up to a minute is   pointed out as seconds. And here is what we  will return so we'll say the start time is   either hours, minutes and seconds. And if there  is no hours, we don't have to say zero something   something. So just show up minutes, and then  seconds. And how I'm going to show it is within   the expander title. And I can you know show it  with a dash in between. I'll say get clean time.   And in there what I want is chapter start. Let's  see what it was. Just just start Okay. All right.   Let's run it one more time and then see what our  application looks like. Awesome. Okay, this is our   application on the sidebar. We can input a episode  ID that we get from listen notes we can say get   podcast summary. It will show a nice title tie A  lot of podcasts title of the episode show was a   thumbnail of this episode. And for each chapter,  we showed the gist of the chapter kind of like a   headline when this chapter started. And when you  click the expander, when you expand it, you get   the summary of this chapter. So this is what we  set out to do. When we achieve that, I hope you   were able to follow along. Again, don't forget  to go grab the code from the GitHub repository.   Welcome to the final project. In this one, you  will learn a bunch of new exciting technologies.   First of all, you will learn how to do  real time speech recognition in Python,   then you will learn how to use the open AI  API and build a virtual assistant or Chatbot.   And finally, you will learn a little bit about  WebSockets and how to use async i o in Python.   So I think this is going to be really fun. And  first of all, let me show you the final projects.   And now when I run the code, I can start talking  to my bot and ask questions. What's your name?   How old are you? What's the best ice cream.   And you'll see this works. So I think this is  super exciting. So now let's get started. Alright,   so here I have a new project folder.  And again, we have our API secrets file,   and now a new main.py file. And the first  thing we're going to do is set up real time   speech recognition. And for this, we have a  detailed blog post on the assembly API block,   this will walk you through the step by step.  So first of all, we need pi audio to do the   microphone recording. So this is the very same  thing that we learned in part one, then we use   WebSockets. And then we use the assembly AI real  time speech recognition feature that works over   WebSockets. And then we create a function to  send the data from our microphone recording   and also a function to receive the data. And  then we can do whatever we want with this. So   but in order to just copy and paste this, let's  actually code this together. So let's get started   at one note here, in order to use the real time  feature, you need to upgrade your account, though.   So yeah, but anyway, let's get started. So  let's import all the things we need. So we want   pi audio again, then we need WebSockets. So we  say import WebSockets. And this is a third party   library that I showed you in the beginning that  makes it easy to work with WebSockets. And this is   built on top of async I O. So now we're going to  build async code. Then we also import async i o we   also import base 64. So we need to encode the data  to a base 64 string before we send this, and then   we import chasen to receive the chasen result. And  then we save from API secrets, we import our API   key from Assembly API. And now the first thing we  set up is set up our microphone recording. So for   this, we use the exact same code that we learned  in part one. So I simply copy and paste this part   from here. So let's copy and paste. So we set  up our parameters, then our PI audio instance,   and then we create our stream. And now we need to  define the URL for the WebSocket. And we can find   this on the blog post homepage. So here, I can  copy and paste the URL. So the URL is WebSockets,   and then assembly ai.com. And then real time and  then the last part is also important. So here we   say question mark sample rate equals 16,000. So  this is the same rates that we use here. So make   sure to align this with what you have. And now we  create one function to send and receive the data.   And this is a async function. So we say async.  Def, and we call the Send Receive. So this is   responsible for both sending and receiving the  data. And now we connect to the web socket. And   we do this with a async context manager. So again,  we see async and then with and then web sockets   dot connect, and now we specified the parameters  u or L, then we say a we set a ping, time out, and   we can set this to 20. For example, then we  want a ping interval and this should be five   and then we also need to send our authorization  token. So the key or the A parameter for this is   extra headers. And this is a dictionary with the  key authorial session, and the value is our token.   And then we say, a sync with AES. And then we  can call this what we want. So I say underscore   W s for WebSocket, then first we wait  to let this connect. So here we say,   await async i o async, I O dot sleep 0.1. So be  careful here, we cannot use timeout sleep. So   we are inside a async function. So we have to use  the async sleep function. And then we wait or we   we tried to connect and wait for the result.  So we say, session underscore begins equals and   then again, await underscore W s, and then this  is called R E S, V for received, I guess. And   then we can print the data and see how this looks.  Let's also print sending messages. And now we need   to enter functions. So again, async functions. So  we say async, def sent. And for now, we simply say   pass, and then we say async, def receive. And  here also we pass. And actually, these are both,   these both will have a infinite to while true  loop. So they will run infinitely and listen for   incoming data. So here, we say while true.  And for now, let's just print sending. And   here we also say while true. And here, we simply  pass, so I don't want to spoil our output. And   now after this, we need to combine them in a async  i o ways. So in order to do this, we say we call   the gather function. So it's called async, I O dot  gather. And now here we gather sense and receive.   And this will return two things. So the sent  results, and the receive results. So actually,   we don't need this. But just in case, we have this  here. And now, after the finding this function,   of course, we also have to run the code.  And we have to run this in an infinite loop.   And in order to do this, we call async, I O and  then dot run, and then our Send Receive function.   So now, this should connect, and then should  print sending all the time. So let's run   this and hope that this works. So yeah, it's  already connected and sending work. So you see,   that's why I didn't put the receive in here  as well. So we get a lot of outputs. And yeah,   I can't even scroll to the top anymore. But  basically, yeah, it should have printed this   once. And then now this is working so far. So we  can continue implementing these two functions now.   So now let's implement the send function first.  And we wrap this in a try except block. And now   we read the microphone input. So we say stream dot  reads, and then we specify the frames per buffer.   And I also want to say exception on overflow  equals false. So sometimes when the WebSocket   connection is too slow, there might be an  overflow, and then we have an exception,   but I don't want this it should still work. And  then we need to convert this or encode it in base   64. So we say base 64 b 64, encode our data, and  then we decode it again in UTF. Eight. This is   what assembly AI expects. Then we need to convert  it to a JSON object. So we say JSON dump s and   then this is a dictionary with the key audio data.  So again, this is what assembly AI needs. And then   here we put Can the data and then we send this and  we also have to await this so awaits W s sent the   JSON data, and then we have to catch a few  errors. So let's copy this from our blog post. So   these ones, let's copy and paste this in here. So,  um, we accept a WebSockets exceptions connection,   closed error, and we print the error, and we make  sure it's have this code, and then we also break,   and then we catch every other error. So it's not  best practice to do it like this, but it's fine   for this simple tutorial. And then we assert here  and then after each wild, true iteration, we also   sleep again. And yeah, so now we can copy this  whole code and paste it into this. So the code   is very similar here. So we have the same try  except, but now here, of course, we have to wait   for the transcription result from Assembly AI. So  we say result, string equals, and then again, we   wait and then the w SRESV, then we can  convert this to a dictionary by saying   results equals JSON dot load from a string. And  here the result string. And now this has a few.   So this is a JSON object, or now in Python,  it's a dictionary. So now we can check a few   keys. So we can get the prompt or actually, now  this is the transcription of what we set. So we   say prompt equals results. And then it tests the  key text. And it also has a key that is called   message type. So now we check if we have a  prompt, and if the results and then the key   message underscore type. And now this should be  final transcripts. And now what assembly is doing,   it will while we are talking, it will already  start sending the transcript. And once we finished   our sentence, it will do another pass and make  a few small corrections if necessary. And then   we get the final transcript. So we want only the  final transcripts. And now for now, let's print   me and then let's print the prompt. And now we  want to use our Chatbot. So now let's print bots.   And then let's for now let's simply print. Let's  print our random text for now. And then we set up   this in the next step. But first, I want to test  this. So let's say this is my answer. And this   is all that we need for the receive functions.  So let's clear this and run this and test this.   We get an error await wasn't used  with future async i o gather Oh,   this is a classic mistake. Of course, here,  I have to say await async i o gather. So   let's run this again. And now it's  working. So yeah. What's your name?   And you see the transcript is working. So no,  I stopped this. But if I scroll up, what's your   name? And each time we get this as my answer. So  this is working. And now of course here we want to   do a clever thing with our prompt and  use our virtual assistant. So for this,   we now set up open AI. So they have a  API that provides access to GPT three.   And this can perform a wide variety of natural  language tasks. So in order to use this,   you have to sign up but you can do this for free  and you get a free you get free credits. So this   will be more than enough to play around with  this. And it's actually super simple to set this   up. So let's create a new file. And I call  this let's call this open a i helper.py.   And then you We also have to install this.  So we have to say pip install open API.   And then we also after signing up, you get  the API token. So we have to copy this in   API secrets. And then we can use this. And now we  can import open API. And we also need to import   our secret. So from API secrets, we import our API  key open API, then we have to set this so we say   Open API dot API key equals API key. And now we  want to do question answering. So the open API API   is actually super simple to use. So we can click  on examples. And then we see a bunch of different   examples. So open AI can do a lot of things, for  example, q&a, grammar, correction, text to command   classification, a lot of different stuff. So  let's click on Q and A. And if we scroll down,   then here we find the code examples. So we already  set our API key. And now we need to grab this. And   let's copy this and let's create a helper  function. So define, and let's call this ask   computer. And this gets the prompt as  input. And now I paste this in here.   So we say response equals open AI dot completion  dot create. Then here we specify an engine.   And now we specify the prompt. And in  our case, the prompt is going to be   the prompt that we put in. So prompt equals  prompt from the parameter. And now there are a   lot of other different parameters that you could  check out and the documentation. So in my case,   I only want to keep the max tokens. So this was  specify how long the result can be. And yeah,   let's say 100 is fine for this. And now this is  all that we need. And now of course, we need to   return the response. And this is actually a JSON  object, again, are now a dictionary. And we only   want to extract the first possible response. So  it can also send more if you specify this here.   So in our case, we only get one. And then  we say response. And this is in the key   choices, and then the index zero. So the first  choice, and then the key texts. So this will   be the actual response from GPT, three. And now in  the main, the only thing we have to do is say from   open AI helper, we import ask, come pewter,  and then down here in the receive functions.   Now here we say. Response equals ask computer,  and then we put in the parent, and then here,   this will be our response. And no, this should  be everything that we need. So now let's again,   clear this and run the main.pi. And let's hope  this works. What's your name? What's your name?   How old are you? Where are you from?   All right, so let's stop this again. And yeah,  you see this works. And this is how you can   build a virtual assistant that works with real  time speech recognition together with open AI.   And yeah, I really hope you enjoyed this  project. If you've watched this far,   thank you so much for following along. And also,  I hope to see you in the future on the assembly AI   channel because on there we also create a lot  of content around Python speech recognition,   and also machine learning. So please check  it out. And then I hope to see you soon. Bye
Info
Channel: freeCodeCamp.org
Views: 237,313
Rating: undefined out of 5
Keywords:
Id: mYUyaKmvu6Y
Channel Id: undefined
Length: 119min 40sec (7180 seconds)
Published: Wed Jun 08 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.