News from the web with NewsPaper3k (Python Package)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody welcome back to my youtube channel today we check newspaper 3k a package for article scrapping so a package that allows us in scraping articles from such as for example the guardian let's assume that we want to grab some news for example this one let's see how to do this so please let's open your jupiter lab your jupiter notebook if you want you can open a collab whatever after that since newspaper 3k relies on nltk you have to pip install nltk and pip install newspaper 3k after that we can import nltk and we have to download into an ltk banked punctuation so let's execute this cell or code everything is nice so now we said that we want to fetch this article so for so long to have to get its url let's copy the url into a variable that we can call news url and now we are ready to import article uppercase a from newspaper once we have imported article from newspaper we can build an article object passing to it as argument the url of the article that we want to download so article equals article news url if you want to know all the functions method etc that you can apply to article you can very easily write the dear article and here we can see that we have really a lot of functions for example we can check the authors of the article we can download it we can fetch the images we can get it as html we can do really a lot of things we can even even do some nlp and for example with this nlp we can make a summary of the article let's see one by one all these functions so first of all once we have let's minimize this first of all once we have our article variable we created article thanks to article passing the url of the article we want to download we can apply to it the download method once we've downloaded it for example we can see it as html so article dot html and here you can see that we have a quite long html downloaded for sure this html is not the easiest way to visualize an article so let's minimize this html and move over we can make a parsing of the article so we can write article.parse and once we have the parsing finished we can check the authors of the article so article point or dot authors and we can see that the author of this article is kevin rose roberson so if we check the article we can see here that the author is kevin rowe rollins rolling song okay cameron rollins we can also check the publishing date of the article so if we write article dot publish date we will get that this article has been issued the 5th of january to 2021 if we check here we see that this article has been issued on tuesday 5th january 2021 so it's correct we can get the title of the article article.title the title is england cove in lockdown likely etc etc and we see that the title is at this england copy the lockdown luckily etc etc and we for example can get the top image of the article so we can write article dot top image we execute this cello code and we can get as a return a link if we click on this link we open the top image of the article let's check the article we see that the top image is exactly this one so everything is working very nicely we can get also movies if there are any movies in the article so article.movies in this case we don't get anything back sometimes movies doesn't work very well and after that as we saw when we printed the list of the methods with their article we can do some nlp to our article so article.nlp and we process our article for some nlp activities for example we can get the keywords in the article so article dot keywords and we see that the keywords are in this article place vaccination look down lasting words minister essendrakovi 19 and we can get a summary of the articles of article dot summary and we can get a summary of the article if you check carefully the outcome of this method we see that we have some formatting charts let's say just for example we have slash n here we have another slash an er etc if we want to print another one here if we want to print this article without the formatting so this slash n we can use print so print and we pass to print article sum in this case we will leave a printing in a very beautiful format because all this slash and will be new lines so after england we have a new line after uh where is it here after program we have another new line etc etc for sure we can print all the articles of all the text of the article and not told in the summary we can do this thanks to print article dot text so we are using gear text we are not using summary as we did before in this way we can print the article in a very nice format we can check that minister says time needed for vasin to take effect means restrictions etc and here is the let's say the same minister says time needed for vasim to take effects means etc etc etc the third national lockdown imposed in england to try to deal with the huge increase in covenanting cases and here we have again the third national lockdown imposed in england to try to deal with the huge increase in kobe 19 etc etc so in this way we can fetch we can download we can summarize an article from a news website in a very really very easy way newspaper 3k allows us also to do something more because we can use it with more than one article so up to now we only work it with one article the article that we wrote here in the news url variable let's see how how to let's see now how to work with multiple articles in order to work with multiple articles we have to use the build option let's see it now we have to import newspaper before we imported article from newspaper here from newspaper import article now we have to import all newspaper so import newspaper and as we did before we have to define a source before we use the url for only one article now we have to use a new url for an entire news website for example we can use the guardian comm slash international so not only one news like we did here but the entire the guardian com international this one let's close this image okay let's put this url into variable we call it source url and finally we can build tanks this source url variable a source papers variable so we call we define a source papers variable that equals newspapers point or dot build a we pass tweet source url this url as argument let's execute it and now we are ready forget all the articles contained in this source papers variable so all the articles that we can find in this guardian.com slash international website we do this tanks f4 cycle so we say for article in source papers this variable dot articles print article url so we are pin printing the urls of all the articles that we are getting into the guardian guardian com international and you see here that we have many uh urls if we want to open one of these we can use these urls as a list so for example if we want to print the first one china sentence top bunker to death for corruption and bigamy we can pass here at zero so source papers this variable point articles zero so we are getting the first article we download it paper one download after that we parse it as we did previously when we deal with only one article now that we finish parsing it we can for example print the title of this article and the title is essentially this one china sentence top bank to data center we can click here the url and open the article to check and in now we can do all the same things we did previously when we dealt only with one article so we can see the tattle we can see the authors ellen davidson let's check ellen dammit song if you want we can do like before we can check for the date we can check for the top image etc what we can do also using the source papers variable is to check the categories of the article so for category in source paper dot category urls before we wrote in the for loop only articles here now we are adding category urls we can print the category in this way we see that in the guardian international we have many categories we have the international we have the guardian newspapers come with the pages of the newspaper we have for example holidays it's opening yes and we have for example also the the video so here we have the videos in the guardian really hope that this kind of tutorial can help you for example with a fetching of news in internet next time we continue with this kind of videos dealing with wikipedia so next time we will see how to fetch data from wikipedia please if you have any kind of question send me an email or leave a comment don't forget to subscribe to my channel stay tuned and bye bye
Info
Channel: Rosario Moscato Lab
Views: 442
Rating: 5 out of 5
Keywords:
Id: ntXkHlCTC-8
Channel Id: undefined
Length: 15min 9sec (909 seconds)
Published: Sat Jan 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.