Build YouTube Summarizer with LLM, Ollama, Llama 3, LangChain and Gradio

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

we are going to build an application that allows you to summarize the YouTube video and one of the tools that you are going to use is the olama tool but you also use Lama 3 Model Lang chain and gradio to build this this is the output of today's code you have a YouTube summarizers empowered by Lama 3 live language model starts with putting in the URL of the YouTube link that you want then you can get title you can get description of that YouTube video and then you can get you click this button and you will get a transcription you can see the full transcript here transcript I should just change it but whatever you will get a full transcript here and you can you can also see an estimate token count of this oops me token counts of this transcript and when you click summarize this full transcript will be summarized by the LM for you this is the end of our today's Journey let's start with the basic idea of the summarization task so basically you have a source that you want to summarize it can be in many forms like text files PDF files excel's words it can even be in a form of URL HTML format even database perhaps like data frames you have a source and you want the llm to summarize when you have a source you need to load the text in order to be ready for summarization and when you load it up you are forming the prompt and this is one of the prompt that is possible for this task The Prompt says summarize the following text and then you put in what you have loaded to that prompt and then you end the prompt with answer this prompt is effectively will be understood by the llm um so that it will create a summary of the given text this is the basic idea you have Source you load it to a text and you craft a final prompt with a text you want to summarize being embedded in there and then give this prompt to the llm simple but there are two main issues here one of the issues is that there could be non-text elements in the files for example like images bed objects like audio files graphs so these elements most of the alms cannot handle it of the batch unless people need to do some coding or manipulations but usually the of them would not handle non-text elements so that's the first challenge the Second Challenge is the text that you want to summarize can end up being too long longer than the lm's context length remember the CM context length is the maximum number of tokens that can fit into the LM if you have a longer than that you cannot fit it you will get an error so if if you think of it as human eating context length of the LMS is how big the mouth of a human is in order to eat and digest the food right so if you have something that you want to summarize and is too long over the context length what are you going to do you need this is the second challenge so in this video we will provide a way to handle Challenge number two where the context length is overflowing or the Tex is being too long the idea is this you add a few more steps in between again you have the text that you want to summarize assuming that you have loaded into the memory already and is too long what you can do is you break it into chunks this the green chunk is the first chunk and you have a yellow chunk that's a second chunk and you can see that there is there is it's possible to have an overlap between chunks adjacent chunks this is to make sure or to encourage the motel to at least understand what is happening before in the chunk before so this is one of the parameters that you can adjust in your chunk spit uh splitting process and then then you can imagine that you have as many chunks as you need in order to cover the whole text right and each for each chunk you are going to craft its Associated prompt similar to this type of prompt but in this text Place holder you only put in the chunk that you want to summarize here and you provide that prompt let's say prompt one to the large language model to summarize and then you will get one summary and then you get prompt two and summary 2 p three in summary three and so on now the last step by the way this step is the map step you map a chunk to its summary and then the final one is the reduce step now you have a list of summaries you just want to reduce it to one final summary and this is the final prompt might not be exactly like this you can customize your final prompt but it's going to be in the line of combine the following separate summaries into a co coherent summary for me and you provide list of summaries Until the End right and then you give this to the llm and then you will get your final summary so this is how easy it is but it can be a lot of coding if you want to do that yourself luckily Lang chain a famous tool for building llm applications has a wrapper function called load summarize chain allows you to do this type of application the summarization especially when the context length is overflowing Lang chain provide a tools to use this to do this in just a few L of codes and this is an example over here in this example some functions are loaded properly and the source is an URL the text of the source is loaded into this documents and okay then will say I'm going to use chat and open AI model GPT 3.5 turbo as my language model and this is it you get a Chain by calling load summarized chain putting in the llm and the chain type that you want now there are three chain types the first one is stuff this is stuff stuff is when you stuff everything in one prompt it's applicable to when the text to summarize is small enough to fit within the context length B just the basics thing stuff everything in there but if you know that the context length is overflowing you need to either use map reduce or refine I'm going to leave refine type uh chain type uh for you to discover but it can be as simple as replacing stuff with map ru chain that's actually I instead of that presentation slide you can also come to my blog to read over this again I think I provide more information in text here well I should start with why do we need summarization so it has values because there are many use cases for example can summarize a report a file documents if it's too long you can summarize a customer chat if the customer is chatting with your your uh your your your company you want to summarize what is happening what kind of action that uh your company should take for example if you are learning new things on YouTube all the time and most of the the video is very very long an hour at least like this live video for example you can use this to summarize your YouTube video as well that's a catch as I mentioned the llm works right now the llms that we we are using is Lama 3 Works in text already so it's not going to summarize YouTube based on its video but it's going to summarize YouTube based on its transcript transcription anyway another application similar to the YouTube video is summarize a very long podcast given that you can get a transcript from it here you have a basic thing you have potential issues and the map reduce approach here that's it one more thing I showed previously that you can use chain. run that was an old style in the new update run function was encouraged to be replaced by the invoke function and it's just as simple as replace run with invoke all right now enough Theory and presentation let's go to the coding here we go all right so I have prepared this so that it's ready to run and produce a gradual interface but I can walk you through what is happening here ohop sorry guys my eyes are getting itchy anyway so we are loading whatever tools that we need we are using p library to get information from a YouTube url link radio is for building an interface request is for retrieving information via and URL is needed to help us Us in this case extract YouTube description from a given link Lang chain is going to help us in working with large language models prepare prompts and pass the outputs for us Tik token is uh library that we are going to use to estimate the token counts that we need to process all right now that's the import now this section is the helper functions this function as the name says is will get uh YouTube descriptions if I provide it with a URL link I didn't write this myself it's from stack Overflow thank you very much for those who have provid this to the public this is a YouTube video info so given the YouTube YouTube url link what I will get out here is a title of that video and the description where I will call the previously mentioned um function this one is going to help me load a YouTube transcript into so now I have a YouTube transcript as a source I want to just load it so that I can prepare I can be ready for the split thing steps right now I'm using a tool from Lang chain as well Lang chain has a tool called YouTube loader where you give it a URL and it will just extract the transcription for you play with with it and you will see what it looks like now I need this to I wrap it up so that I can display it in my gradio interface nothing more than that so that text splitter is going to so the text splitter is handling this part handling this splitting part uh By the way when I show you guys these colored um chunks it's just theoretical it's not as Rous as uh when you are going to use tools like recursive character text split or sentence splitter so for those kind of technical details you can read the documentation but again Lang chain um provides text splitter tools for us to use and um one of the way that I'm going to use is a recursive character text SPS where I provide the chunk size how big or for this chunk how many tokens do I need as a maximum and then I can provide the number of overlaps between adjusts adjust chunks now this one is I give in the URL of the YouTube video that I want then I'm going to get a transcript text as as a text and also how many tokens are in it and eventually this is the punch L here this function gets called when a button a some mmm button is clicked so basically what it needs here it takes an input URL um the YouTube url it has a temperature chunk size and an overlap size so basically what is going to happen is that first a document URL which is a source is going to be loaded and loaded as text to summarize then the steps of chunking will come in using a text splitter now I get chunks then I Define the language model I'm going to use and thank to Old Lama running at the background I am my old my my llm is sitting on my laptop available at Port Local Host available at local host Port 11434 with this temperature that I I can set to whatever number I want then I create a change using uh helper function called load summarize change and I want a map reduce process or approach when I have my change I can just say invoke it I'll get an output and eventually an output text that's very simple right you might see that there are so many functions but most of them follow through the steps load the text from a s Source split the text to llm define a change and run it and these functions will be coupled with the gradio interface I am going to provide this code on my uh GitHub on GitHub so don't worry you don't need to copy and paste here but when I run this code it's going to create a user interface and Via gradio where all the buttons are placed according to my design and you can have it to go by the way this is a Jupiter notebook you can have a gradual embedded oh can have a gradal interface embedded here but what I can show you I want to show you is actually here so I prepare main.py which can be run with the command line and at the end of the code demo. launch I said share to be true meaning I can eventually I will get a link that I can send it to some somewhere else so no matter where it is in the world with the internet of course he or she or the user should be able to run this app so let's get started we just pull a terminal all right let's this one is yes I'm using the right environment so I can run taada not yet so it's uh if I want to do it locally it's here but if I if I want to send someone else this link he or she will be able to run this so let's actually use that one now I put in the mail it's loading the interface this is how it looks like now I have a default value so just just to show you quickly so okay I want to get the info takes a while but what it's doing here is initializing whatever it needs to do to talk to to YouTube server get the description back and populate this to make sure that I'm not fooling you guys here we go yes this is the video so this is I created it's about using generative AI when I travel to Tokyo it's my first time there and what kind of helps I can get from generative AI um when I travel and this is the title check this out when you have time so this is the title geni help me travel in Tokyo and the description is here it starts with three example and ends with ch chat TPT oop sorry guys not this page so here start with three examples and end it with chop chat TBT so I was able to retrieve the title and retrieve the description of a YouTube out correctly now next one I can get my transcription clicking this button will talk to the YouTube server again get back the transcript and process how many counts using Tik token how many uh token counts are there for this transcription so here this is a transcript from my YouTube video here we go and then you are ready to summarize this transcript now make sure that you note this so I made it in such a way that you can change the size and overlap overlap size parameter easily the temperature as well but again this is a summary summarization task so typically excuse me typically you want to set the temperature as low as possible now Lama 3 contact length is about 8,000 if I remember correctly so this one should be able to fit in one go with um with the with this one prompt but let's actually U fake it out well you don't have to Fig it out so let's say that a chunk size is 2,000 is an old Briner that will take one computation right one summarization task and that's it because this thing can fit into that while it's being calculated let me just prepare a second video which is much longer my channel how to go right here I am going to go to a very very long this is a very long live sections which is almost 2 hours so I copy that URL ready so well it's taking quite a while remember I'm running this uh llm locally on my laptop and my laptop is an M1 laptop so it might be able to calculate it but quite some time I forgot to show you that if you run Lama serve um not the app you can actually see the computation that is happening in the background taking quite a long time to be honest so while is being calculated I can point out several improvements that we can do with this the first thing is okay now it's just interrupted me all right sorry I'll come back to the idea later but this is a summary a traveler you uses Google's Gemini app to navigate Japan without speaking Japanese correct relying on generative AI for local information part one yes you can get if you don't know anything about that particular region of the world you can ask for culture manners that you should do in the public or something else all areas you should avoid you can get Gemini to help with recommendations about cafes and restaurants because it has integration with Google Map and lastly translation this is a little bit understatement in my video I actually mean Gemini helps you understand from an image right if you go to somewhere and you see I want to buy this product but it's not in English at all you can try to ask Gemini to explain that product by taking an image and ask it to explain it to you and this is what I did and it actually it was quite helpful I had a headache and I want to take some medicine but I don't I'm not sure which one I should take and with some help from Gemini I can find the medicine I wanted now while the AI is helpful it has limitations in understanding cultural nuances and providing information on law resource languages like Japanese correct that's what I said the speaker concludes that while can be useful tool human interaction and research are see necessary for deeper understanding of place that's right at least for now when when the internet or the llms have not haven't been able to do extensive search do research on its own for resource languages like Japanese language for example all right so this is a short transcript where it fits into one goal now let's try another URL this is a longer one let's get a title how to hire gen a LM tokenization plus TI token and another topic is about build customer chatbot with agent LM and rack this is correct if you go back here and I'll get a transcription from it let's see wow that's quite a lot 14,000 tokens definitely overflowing the capability of 3 at least in the version that I use is 8,000 chunk size so I am going to say try your best to understand chunk size of 5,000 so this should be able to put me to run to have four chunks or five chunks depending on the overlap size let's say I want to keep an overlap of about 10% of the chunk size and just to be fun maybe increases temperature well I'm not going to do that I want to make sure that it works properly so I keep the creativity part of the model low so now I'm going to hit summarize buttons and it's going to power through different chunks right creating summary for each chunks and with all these sub summaries it's going to write a final summary for us now while it's going to be doing its job for I hope not longer than two minutes I can point out what can be improved upon first one this can be changed to accommodate many other things like upload a file right you can add a tab there where a tab can be YouTube link for one tab another link can be PDF another link can beuse me Excel file for example and then this is just a source right a location of a source for YouTube it can be like this for a file it can be a place where you can upload a file or drop in a file and this section can be anything like information related to that file or a sneak peek into that file for YouTube I can add another panel that display an image a cover of that YouTube video I can do that too I can add comments if I have apis to retrieve that kind of information this is easy part this is file related but but for the summarized steps you can have an improvement on M um the a set of parameters that you can you can tweak in the algorithm right now I forc you to use Lama 3 maybe you want to have a DropBox to select the model model that you want to for it to work on this summarization task that's possible as well other parameters can be um a port number uh if you decided that you are going to change default value of the port away from 1 1434 or you think it might be possible that one day you are using this port and the other day you using another you can have a box so that you allows the user to provide a Target Port um and a model to load it up with that's one of the way what else you can do well let me just go back to my code and then you can actually help me spot out what did I do what can be done to improve on this tab right here so pretty much if you hardcode anything then this can be improved right yeah that that's what I said you can find a way to make sure allows uh the user to select the model you want to use for Tik token as well you might allows for users to s an encoder to help you estimate the right number of tokens right now you see well the tokenizer is for CH GP for gp4 but my model is plama 3 that's a mismatch that's the reason why I said it's an estimated token count you can improve this as well you can have a user select what type of chain types um they want to use it could be a stuff chain type map reduce or refine process or something like like that um another possible possibility is to calculate how many chunks there is before the user hit summarize buttons maybe he they want to know how many chunks do I get myself into do I split it too little like do I have to compute thousand chunks where 20 can suffice so that's a designed option you you can do and another thing is another thing is and this is maybe a critical critical one as you can see here the summary is short right I talked about two hours on three different topics to be honest I'm not going to be happy with this amount of summarization it's too little no maybe it doesn't have enough detail so that I can learn from it think about it if this is going to be about summarizing a lecture video and then you get this much summary as that's no point so that is a way for improvement on many level of summarization task if you just want a few liners this is it but if you want to summarize suent enough but also have details so that you can actually learn or pick up some specific information for it then you need to you need to optimize and customize this process all right just to make sure that I understand this correctly I'm just going to read through this summary the text appears to be a conversation between two AI models discussing between two AI models discussing I'm not an AI see their capabilities strengths potential application it's a human discussing two AI models along its Dimensions including capability strengths potential application and so on so this is wrong one model is just creating a career development plan based on user goals and assessments the conversations will touch us on tokenization natural language processing NLP Ming a customer chatbot for this is correct this is not now it now it refers to a human speaker me the speaker demonstrates the chatbot capabilities limitations suggesting improvements to handle specific requests right well I am requesting that this particular chat bot needs to be improved if you want to have a better summarization task anyway so is just the way it is I use premade function from BL chain I got a pre-made summary so this lessons actually tells you summarization task can be easy and it can be detailed um if you want high quality summarization task based on your goal you probably need to dig deeper and customize your steps and it's it's not going to be that difficult guys you see the concepts here is just that what kind of prompt that you are going to ass it and how many layers you are willing to go through and in theory that's it don't forget to give us likes leave your comments share us to your friends and also subscribe to our channels we have YouTube LinkedIn Tik Tok and Facebook

Info

Channel: Case Done by AI

Views: 258

Rating: undefined out of 5

Keywords: AI, Generative AI, LLM, Gen AI, LangChain, OpenAI, Business, Entrepreneur

Id: Bvf0WolGGYc

Channel Id: undefined

Length: 30min 25sec (1825 seconds)

Published: Fri May 31 2024