AI Text Summarization with Hugging Face Transformers in 4 Lines of Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's happening guys my name is nicolas and in this video we're going to be taking a look at text summarization with the hugging face transformers package so we'll be able to take a huge block of text pass it to our transformer pipeline and get a summarized version of it let's take a deeper look at what we're going to be going through so in this video we're going to be covering three key things so first up we're going to start out by installing the hugging face transformers library then what we're going to do is build a summarization pipeline now cool thing about the hugging face transformers library is that you've got a whole heap of pre-trained pipelines that you can just pick up and use without having to do a whole heap of training then what we're going to do is grab part of a blog post pass it to our summarization pipeline and then take a look at our summarize result now let's take a look as to how this is all going to fit together so the first thing that we're going to do is import the hugging face transformers library into our notebook and specifically we're going to be using the pipeline method for that then we're going to build and download one of those pre-trained pipelines so you'll see how to download that automatically and then what we're going to do is summarize our blog post and we'll switch out a bunch of articles to see what these actually look like ready to do it let's get to it alrighty so in order to build our summarizer there's three key things that we need to do so first up we need to install the hugging face transformers library and import it as a dependency then what we're going to do is load a pre-trained summarization pipeline to allow us to perform our summarization then last but not least we're actually going to go on ahead and summarize our text so we'll pass through a block of text run it through our pipeline and ideally we should get a summary back so first up let's start out by installing the hugging face transformers library now in this particular case we're going to be running through the standard installation which is just a pip install transformers but you can use some of the other installation methods if you want to the baseline insulation tends to work pretty well so we're going to jump back into our notebook and we're going to go on ahead and install it now again all the links mentioned in this video as well as the completed code will be available in the description below so if you want to grab that by all means check it out so let's go on ahead and install transformers alrighty so that's transformers and now installed so in order to do that we've just read an exclamation mark pip install transformers exactly as you saw in the documentation so this is going to install the transformers library inside of our jupyter notebook environment now the next thing that we need to do is actually import it as a dependency so let's go ahead and do that all right so that's our dependency imported so in order to do that written from transformers import pipeline so our pipeline is going to give us a method that allows us to easily download and use a summarization pipeline so rather than having to go on ahead and train a huge language model were actually able to leverage the pre-built summarization pipeline that hugging face has now one of the cool things as well is that hugging face has so many different nlp based capabilities so if you'd like to see me do more videos on this by all means leave a mention in the comments below but in this case we're going to be using the summarization pipeline so what we're next going to do is actually load that into our notebook so let's go ahead and do it perfect so that's our summarization pipeline imported and loaded now one of the key things is right now i already had the summarization pipeline downloaded so it went reasonably quickly but if you're doing this for the first time what it's actually going to do automatically is download that summarization pipeline into your local machine so this might take a little bit longer but it'll happen all automatically you just need to write that same line of code now that line of code that we actually wrote is basically one single line and what we've done is we've created a new variable called summarizer and then we've set it equal to pipeline and then to that we've passed through an argument that says summarization so this basically tells our hugging face transformers pipeline that we want to use the summarization pipeline or the pre-trained summarization pipeline and it brings it into our notebook so we can then use this summarizer that you see over here to actually pass through our text and generate a summary now on that bit we actually need some text so we're going to grab part of a blog post from hackernoon and try to summarize that so we'll first of create a new variable called article and we're going to leave it empty for now and then what we're going to do is paste in some text so if we go to hakanoon i found this article which basically tells us this should be taught in entrepreneurship classes so a little bit of a click bait title but rather than actually going through and reading the entire article we might just choose to summarize it so say we went and copied this part of our text now key thing to note is the pre-trained summarization pipeline that you're seeing here does have a bit of a limit as to how large of an article can summarize so in this case i'm just copying part of a blog post if you'd like to see how we might approach a longer blog post by all means leave a mention in the comments below and we'll take a look at how we can do that in a future video in this case we're going to grab this block of text paste it into our article variable so you can see i've just copied and pasted it into there and now the next thing that we're going to do is actually start using our summarizer that we created up here so we're going to pass through our article and set a couple of other keyword parameters so let's go ahead and do that and there you go so we've now gone and generated a summary before i delve into that let's take a look at the code that we wrote so as i said we're going to be using our summarizer that we created up here and then to that we're passing through four different things so our first argument is the article so this is the text that we want to pass through this could also just be reworded as text then the next three arguments are all keyword arguments so the first one that we're doing is we're setting the maximum length and this is the maximum number of words that we want our summarizer to return we're setting our minimum length so this is the minimum number of words that we want our summarizer to return and then we're setting do underscore sample equal to false so this basically tells our summarizer that we want to use a greedy decoder now what that basically means is that we're going to return back each word so when we actually return this sequence what we're doing is we're returning back the next word which has the highest probability of actually making sense there's a whole bunch of different decoder methods and i found this great blog post again i'll link to it in the description that actually visualizes what each one of the decoder structures look like so whether or not you're looking at beam search whether or not you're taking a look at pure sampling which is what would happen if we set that to true whether or not we're looking at greedy i think grady's in there as well it was around here so you can see that we're currently using the greedy decoder by setting this to false basically the different ways of determining what word to return next now let's actually take a look at our summary so you can see here that we've got our summary text and in this case it's returned back the sentence so entrepreneurship is rotten at its very core and one way to fix it i don't know if it is truly rotten net it's very cool but that's the summary so it's written at its very core and one way to fix it is to change some of the things we teach about it in business school so safety net shortcut and the five should be explained to young ones to prep them for modern entrepreneurship so you can see it's actually returned back a pretty good summary now we could sub this out and paste through different text so in this case i was actually taking a look at a different hackernoon article not too long ago which is all about biometric fingerprinting for timesheets for employees so something a little bit topical and a little bit controversial now if we paste that text in let's see the summary that we get back and again we can run our summarizer pipeline and so the summary that we're getting back this time so employers are starting to use time clock machines that fingerprint employees exactly as i was saying the machines are tied to your unique characteristics such as your face your fingerprints how you talk and even how you walk the center for disease control says the coronavirus can remain on surfaces for hours and i think it started to talk about how that could potentially be a bit of an issue given the current pandemic now another thing that's important to note as well is if you wanted to just grab this text it's really just some standard python functionality to get it out so say we stored a result inside of a variable cortex really should be summary let's change that so to grab this particular block of text we'd first up need to go inside of the array and grab the first result so let's do that first then to grab the text we can use this key and grab it out and there you go so we've now gone and grabbed our explicit blocker text and you'll be able to see that this is in fact or does in fact meet each of our keyword parameters so max length and min length so if we split it you can see that it's clearing our minimum length and it's clearing our maximum length as well so in this case it has actually met our keyword parameters but that about summarizes how to actually build a summarizer using hugging face transformers now again let me know in the comments below if you'd like to see more stuff with hugging face and their amazing transformer library but for now that about wraps it up thanks so much for tuning in guys hopefully you found this video useful if you did be sure to give it a thumbs up hit subscribe and tweak that bell so you get notified of when i release future videos and let me know what you went about summarizing thanks again for tuning in peace
Info
Channel: Nicholas Renotte
Views: 7,894
Rating: 4.9186049 out of 5
Keywords:
Id: TsfLm5iiYb4
Channel Id: undefined
Length: 9min 43sec (583 seconds)
Published: Sat Jan 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.