Full-Stack Web Development "YouTube Transcription" coding tutorial (JavaScript, Google Cloud)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey welcome back with your host x google x facebook tech lead today we're going to be doing a little bit of coding actually as one of the top tech channels on youtube you know i don't code all that often but when i do you know it's going to be good good for nothing that is so as you can see we're back in the studio today ready to get some work done and we've got quite a show lined up for you today we're even going to do some machine learning and it's going to be a real class act or rather i'm not all that into object oriented programming i'm more into scripting and hacking around but it's going to be fun let's get into it so what we're going to do today is build a transcriber for youtube such that you can load up any youtube video and just see the script for the video right there you don't necessarily have to sit through the whole entire video just to see what the person is going to be talking about you can just read through the entire youtube script right there wouldn't that be nice and we're going to go through a number of approaches using full stack web development and excellent hardware not to mention this is going to be a great project to follow along with if you want to build your own version of this and maybe you can even put it on your resume as a little bit of resume padding now the first thing you need to understand is in order to code up this project you may want to use a monitor now the fact is i don't actually use monitors myself i don't need it because i can already visualize in my mind what the computer monitor output is going to look like but i understand that for most people they do appreciate some visual confirmation response which is why we have had lg ship us this monitor specifically for programmer productivity and so that's really step one of this coding tutorial which is to get this monitor set up you know coding is um it's not easy being a software engineer and so what we have here is a 35-inch ultra wide curved monitor by lg if you're interested in this model i'll have a link in the description below but this is one of their newest monitors built specifically for programmer productivity and the ultra wide curved aspect of it makes it especially good when you have multiple windows on screen at the same time which is really an extremely common case for programmers all right beautiful perfect so now we have our monitor all set up for our coding project and you know there's a few quick points i want to make about this monitor as considerations for your next potential monitor as a programmer number one is the choice between an ultra wide monitor versus a standard aspect 4k type of monitor and my comment on this is that at least for the 4k monitors i found that i always had to play around with the scaling and i would always end up using a non-default resolution that would be scaled which would actually cause the performance to suffer quite a bit for a number of programs for example if you're using a lot of the adobe applications like adobe premiere lightroom or photoshop you will find that the performance actually suffers if you're not using it at the default native resolution and yet at least for me i found the fonts to be so small at native resolution i always had to scale things up and so that's generally just not going to be an issue for ultrawide monitors this one's running at native resolution by 1440 pixels and then the text and font sizes are actually normal size they're very readable and you get more space left to right for multitasking you can compare that to say the 4k monitors which are extremely high pixel density maybe it's good for gaming or something like that perhaps but i'm not sure if it's good for reading a lot of text on screen so it depends really on your usage and then the other notable aspect about this particular monitor is that it helps keep your desk really clean because it comes built in with so many features you get the usb hub in the back where you get a whole bunch of ports that you can connect into so then you may not need like an external usb hub anymore it comes with audio speakers excellent built-in ones so that you don't need external audio speakers cluttering up your desk like i used to have it's also very clean almost as minimalistic as you can get such that you can focus on the task at hand now moving on to the second thing we need a keyboard that's right generally for coding we're going to need a keyboard and not just any keyboard will do we need a mechanical keyboard which is why we have to thank our next sponsor hhkb for setting us up with one of the top keyboards for programmers out there known as the happy hacking keyboard this keyboard is legendary in its reputation all right so check this out the happy hacking mechanical keyboard and really the notable thing about this keyboard are the switches they're made out of these special topper switches which just feel extremely nice to type on it's almost somewhere between the linear switch and the tactile switch but there's really nothing else like it on the market which is what makes this keyboard so special not to mention it's compact form factor which is just really clean safe space on your desk and i was pleased to find this design has been updated it's taking a usbc connector which is so nice and it also supports bluetooth wireless connectivity if you want to live the wireless lifestyle a single set of batteries will last you three to four months actually and it comes with adjustable height stance on the back for your ergonomic comfort which is highly important so i think this will do for us if you're interested in this keyboard i will also have a link for this in the description below and then really to round things out i'm going to be picking the logitech g502 for my mouse because we are dangerously low on rgb here and so there you have it i think we're about ready to get started this is my chosen loadout for today's coding project all right so that was fun now it's time for us to actually pay for all of this gear by putting in some work so we're going to do some coding what we're trying to build here as i mentioned is a youtube transcription service and i kind of got the idea for this by looking at some of the machine learning and ai libraries that are available from cloud service providers like google so if we take a look here we can see that google cloud offers a bunch of random services right like there's a bunch of different products you got compute engine cloud storage and you know this is a great way to come up with new project ideas because these frameworks and libraries will enable the next wave of future services and platforms that are being built on top of these technologies so when i come here i can see okay we've got some ai and machine learning tools and one interesting one here is speech to text and so here we can actually leverage google's powerful ai technologies to create a simple service that will transcribe youtube videos into text such that people can just quickly read through the text version of any video and they don't have to sit around for eight to ten minutes just to see what somebody has to say so if i come here i can go into the console for google cloud platform i've already created a project known as yt caption and i'm going to enable this for the project and so then coming into the documentation you can see that in order to make an audio transcription request we need to create a json request file indicating the settings that we would like pass in the audio file which is going to be in a google storage server and then use the curl command to issue a request to the google speech servers to transcribe the video and then what we would hopefully get as the output would be the json response we get the transcript with the text and then the confidence indicating how confident the machine learning algorithms were on measuring this piece of text so why don't we try this out the first thing is i will go into the terminal create a directory for the project called ytcap and then i will edit a json file called syncrequest.json and then i'll paste that and then we need to change the audio uri to a file that we would like the service to try to transcribe now before i wire this whole thing up we can imagine how this whole thing would work maybe the user would enter a youtube url address and then we can somehow download that video maybe we're using the youtube dl library which is an open source library for downloading youtube videos but you know you could conceptually use that to download the video and then you could use a service like say ffmpeg which is another open source project for converting the youtube video like a mp4 file into the actual audio file like an mp3 file or maybe a web file so let's assume that we've gone through all of the steps and we have this yf file and then we can issue a command to google cloud storage services in order to get it uploaded onto the google cloud servers such that we actually have a uri that is accessible to the google speech services and the reason for this is that if you were to take a look at the actual reference documentation for what they're looking for so we can see here this is the actual reference specification for the json config file we're talking about earlier and if i click on recognition audio we can see here that the uri has to be a google cloud storage uri in the format of gs colon slash bucket name slash object name so you know i'm sure there's an api for this but i also found they do have a visual interface for this on their website google cloud storage browser where you can create a bucket and then add objects to it just like a drag and drop interface so i figured that's what we'll do here i'll create a bucket here called tlyt videos great and then we can upload files to that and so i just record something for a sample audio file hello world this is the tech lead and i love my monitor and keyboard have a great day and then what i do here is i'll simply upload this onto google cloud platform storage and then what you can do here is you can get a link for the file through this uri so i will copy this uri to the clipboard and so then i can come back to the json request file paste in the url for the audio file and execute the chrome request so in order to interact with the google cloud services i need to create an authenticated request using credentials so there are a number of different ways to authenticate into the google cloud platform but at least what i've done here is i've created an api key you can copy that and then you can pass that into the request and so now what i can do is execute the same curl request and pass in the api key press enter and then it's telling me okay well now i have another issue which is that the anonymous color does not have storage objects get access so now i need to just come into the bucket and add permissions for all users in order to access the audio file so i'll give it access to storage object viewer i believe and then if i allow public access for this now if i to execute this one more time i'm saying okay invalid recognition config bad encoding and so if we take a look at the documentation we'll see that for audio encoding they really only accept a limited number of encoding formats so what i'm going to do is just re-encode the audio format that i had which was an m4a file into a wav file and one way you can do this is by using the ffmpeg open source library and then i will create an audio dot wav file and so just like that i've created it and what i'm gonna do is now just upload this into google storage and so now at this point the file has been uploaded and i will try executing this request again changing the uri to audio.web so hopefully this format will be recognized now and now i'm getting kind of a different error message it says i must use a single channel audio file but the wav header indicates two channels and so if you were to dig around in this you'll see that you can actually specify the audio channel count and so i need to just include this in the configuration so i will add here other channel count too so if i were to run this one more time you see that this time i was able to get the result let's see what it transcribed hello world this is the tech lead and i love my monitor and keyboard have a great day nice that was exactly what i said and i had a 0.92 confidence interval for this so it seems like we are able to get this service up and running and we are using google's ai transcription services in order to do the processing and so it seems like we may have a potential technical solution here maybe we can build something around this but you know once you start looking into this a little bit more you'll see that there are a number of different limits here so for example the api actually has a content limit of just one minute per audio file which is nowhere near enough and then they recommend after that to use another api known as long running recognize which is asynchronous and that would probably complicate the implementation a little bit more but even with this api it said that there was a content limit of 10 megabytes per local file which is still nowhere near enough if i want to process 10 minute videos and there may still be other means of utilizing this api like you could use the streaming version of this which has a higher limit but then you need to stream the audio into the service and so at this point with all of these content limits and restrictions not to mention the processing time for some of this could take quite a while and i really wanted the service to be able to respond very quickly not have users wait one two five minutes just for a result i think it's time to take a step back and maybe consider a different approach although it's been a very interesting exploration and it could merit some more looking into but in the meanwhile i came up with another idea which is that you know when you're watching youtube videos you already see some of the other generated captions right there on the video so the data must already be there we don't necessarily need to transcribe this from scratch so i started looking into the official youtube data api where perhaps we can fetch the captions for each video and use that instead on this beautiful monitor using this beautiful keyboard so if i look up youtube official data api go to the reference and we can see that there are captions and you can actually list captions and so this captions api seems like it may be interesting i'm able to come here use the api explorer request a snippet for a video id execute that and i can see i'm able to get some information here about the caption the language the id of the caption and so i started experimenting with this if we come here we can click show code and see that there's a certain piece of code we can copy for say php you can use any number of languages like python if you like but i basically copied this put it into a text file in php and for this to work i also need my credentials so i went into my google cloud platform account downloaded the credentials for my service account created the json file and then i was able to get that uploaded onto my server and so now at this point we're going to create a service object for the google service youtube and then i will say that the response is equal to the service list captions and then i'll pass in the snippet that i want and then the video id and again here none of this would have been possible without my beautiful monitor and beautiful keyboard let me see what happens if i run this okay so i'm getting some information back now and i can see okay i have a caption list response that looks good i have the id this all seems pretty good so far and now i need to obtain the contents of this caption and so now i can see that there's an api for downloading the caption in which i pass in the id so i just need to pass in this id which i have available here so let me go back to my code now i need to issue a new request to the api i need to get the endpoint for this caption which is going to be youtube v3 captions slash the id so i'm going to just put that in here let's see what happens if i run this print r on the response okay so i'm able to get some information back here but you can see here that this response it says private forbidden 403 and so i started looking around into why i was not able to get permission on this and it turns out that on stack overflow a lot of other people were getting the similar thing and essentially the captions.download endpoint only works for videos that your google account owns and so it's a shame that they didn't mention this anywhere really in this api but essentially this restriction really just cripples the whole captions api because i'm not able to actually get the contents of the captions although i'm able to figure out what captions are available and so this overall approach also doesn't really work unfortunately although it's quite interesting and there may be certain apis in here that could be useful for you if you're building another type of service and so i would encourage you to look through this maybe there's something interesting here for you all right well it's been a long day and welcome back to part three of the epic conclusion and finale where we actually get this to work and so let me show you what i've done here i was watching one of my videos and i noticed well all of the caption files are available here and so i needed to find a way to extract this and i know that the data is available somewhere on this web page so what you can actually do here is you go into the console right we go into the console and we can go into the network and we check that when we load up the captions we're going to see that there's this request here to this undocumented api really let me show you here called the timed text you see that and we can see here that this response is actually all of the captions right we see the utf-8 words we see the offset times at which they're going to appear and so this is the file that we need now if you take a look at the url here and i can open this in the new tab if i were to replace this video id here with any other random video id you will see that the link is not going to work so i put in another url and says that's an error the request was not found on the server and so what's really going on here is that this is kind of a private api that's sort of locked down you can't just access it for any random video it's only for the video that you're watching at and so when we think about this we can actually come up with one potential solution maybe a client size solution that executes in javascript on your browser and it can scrape the web page to find this url and then once we have that url we can fetch the data the contents of that url parse it and then format it into some nice visual document that the viewer can read if they wanted to just read the transcript for the whole entire video so there are really a number of different technical approaches we can do here we could build a browser plugin for example a chrome or firefox plugin but one thing that we can do that's going to be workable across any browser virtually is known as a javascript bookmarklet and so just for example one of my favorite javascript the bookmarklets is known as camel camel camel and what you do is you just take this button you can drag it into your bookmark bar and then when you go to any amazon webpage you can click on the javascript bookmarklet it opens up a new window and shows you all of the price history so we can do something similar where we're also going to create a bookmarklet that takes some of the data from the page it can execute javascript on the page obtain the url and then send that into a new window that shows the transcript for the youtube video so let's take a look at how this is going to work the first thing is we're going to need to figure out what this time text url is and you know if you were to go into the console and we'll just say document.body right and we'll take a look at everything we just copy everything and we'll do a quick search here and so i can just paste this into a text editor do a quick search for time text and i do see that the time text variable is somewhere inside this document so what we can do is use regular expressions to scrape and extract this url and so we can come up with a new regular expression that what we want is the time text that comes after the player captions tracklist render right so i'm going to have this i'm going to look for this piece of text then i'll use a wildcard match that gets to the portion that starts with youtube.com api slash time to text and then i will capture everything up until the first quote and so that's pretty much going to be my regular expression i will need to escape the backslashes here and this happy hacking keyboard by the way i feel like i'm still getting used to it because you have to use a bunch of different key combinations to access the symbols that you want to get at like arrow keys and stuff like that so it's been a lot of new muscle memory for me but the keyboard is totally functional you can do everything with this and the form factor is so small it feels very good to type on so anyways here i have my regular expression and then we can try to execute it and match it on document.body.innerhtml and we can see that we actually were able to get a match for that and if i were to go to the first match you'll see that this is the url that i'm looking for and so now we need to just take this javascript code and turn it into a bookmarklet so a javascript bookmarklet is very easy to write you can pretty much use only html but i'm going to do it in php and i'll show you why i'm going to do that in a second now generally to create a javascript bookmarklet we're just going to upload a piece of text that says ahref equals javascript and you know just for fun we'll say alert hi and then give it a title youtube transcriber and so once i save this and access the web page you'll see i have a link here if i drag the link into my browser and tap on it you'll see that it's already executing javascript code so what we want to do now here is replace this javascript code with essentially the code that we just had in the browser so for this i'm going to have it execute actually a function i'm going to pass in a piece of code and this piece of code is going to be a piece of php code here so it will be essentially what we just wrote and then we want to open a new window we'll call it like tecly.attack slash ytcaptionslashcaption.php and pass in the url so plus url and that should essentially do what our bookmarklet is trying to do but there's some trickiness here with escaping the parameters like the code here you can't just put the code there you have to actually encode it so we actually need to take this code and call encode uri component in order to have it fit within a javascript bookmarklet and i've also copied and pasted this function for encoding the uri component which i found on stack overflow and so now if i have to refresh this page i can add this bookmarklet to my browser and if i were to go to a youtube video access this then i'm finding it's not working so what i can do is try to issue an alert on this to see what's going on okay i need to execute this function like so and so now this bookmarkle has successfully scraped the url for the captions file the next step is we need to actually also encode this into a url and then pass it into our server and we need to call encode uri component on this url and i think that should do it and so now if i were to just re-add this bookmarklet and come to this video and click it again you'll see that i'm opening a blank page it's an error page because it doesn't exist now but i'm able to get this url so now let's create the page that this goes to and the first thing is we need to see if we can get this url so i'll reach into the url parameters and get the url for that and see if i can just print it out and so this url looks good but it needs to be unescaped you'll see that there's a bunch of weird symbols in here and that's because if you look it up it's using some type of json encoding we need to unencode that so the process here is that we're just going to run json decode it's a built-in function so we'll call json decode on the url and if i have to reload then you'll see that this is looking like a standard url string if i want to access it then i am actually getting all of the data and one thing i found is that if i were to append format equals json3 then i'm able to turn this into the json representation so we're also going to just append that to the url so then let me just start printing out some of this stuff and now i'm going to make a request to fetch it so i will say data equals file get content from the url print that and now it's just a simple process of decoding the json and then formatting it nicely so we simply come into here and we'll say data equals json decode on the data and if i have to run print r on the data which in php is just a recursive representation of this you'll be able to see all of the data is in this thing called events right and then you have all of these strings like all right welcome back with your so this is all the string representation so then we need to get the events field and you know you can really do this in any language it doesn't have to be php necessarily i think python is a great language too if you want to process it in that or golang or anything and then we're going to look for these sets so for each event sex as a segment then we'll append the utf portion two words and at the end we'll print implode on the words so we just connect them all together and so if i were to save this and reload then you'll see i'm getting the full transcript and then one more fun little thing i thought we might do here is you know like the line breaks on this are really narrow it pretty much matches what the video is but i think for general reading we may want to break at every other line for example so we may want to remove every other new line and there's a very simple one-liner way we can do that using regular expressions we simply do a pattern match for two lines and then replace that with the single line removing one of the line breaks regular expression replace and we're looking for two sentences broken by new lines and then we're going to replace that with just the first and second sentence having removed one of the new lines and then if we were to print the text then you'll see that this is far more readable we're getting about two lines of captions on a single line and so that pretty much wraps it up for this project i'm able to go to really any youtube video press on this button and i'll see a text transcript of this youtube video that i'll be able to just read through quickly if i didn't want to wait and watch through an entire eight to ten minutes of content so i hope you enjoyed that as a prototype code i'll have a link to the project demo and the source code in the description below if you want to check it out you know one thing i did notice about this which there is some room for improvement is this generally works only on the first load of a video because when you click on the second video i noticed that the document body it doesn't get refreshed youtube uses a single page loading type of design so if i were to click on another video for example and then click on this bookmarklet it would still only load the captions for the first video in order to get this to work you just have to click refresh once you refresh it then you can click on this bookmarklet and it will show the updated contents and so that's kind of a flaw in this current version kind of a more technical matter but i think that if you were to implement this using a browser extension then you would be able to intercept the response for the single page load designs and essentially you could figure out the caption urls for all of the subsequent videos that a user would be visiting you would just have to go into the network you click on another video and you can see there's this xhr response in the watch request where you get the player response and this is going to contain the time to text url but that's a little bit more involved and i will leave that perhaps as an exercise for the reader so that'll do for me hope you enjoyed the episode maybe you learned a little something but let me know what did you think of the project how would you do it would you do it any differently let me know in the comments below i'd love to see that if you liked the video please give a like and subscribe really appreciate that and i'll see you in the next one thanks bye
Info
Channel: TechLead
Views: 156,030
Rating: undefined out of 5
Keywords: javascript, google cloud, machine learning, full stack, web development, webdev, learn to code, programming, computer science, youtube captions, youtube transcriber, hhkb, happy hacking keyboard
Id: r7SO-Oq3d5E
Channel Id: undefined
Length: 26min 7sec (1567 seconds)
Published: Thu Dec 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.