Building a Python News Aggregator from SCRATCH! | Data Science for Media Bias Detection #12

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in today's episode i'll be creating a fully functional news aggregator in python which pulls all new york times technology related articles extracts metadata from them and performs sentiment analysis on every single one are you ready i sure am let's dive right in [Music] hello my name is rohak and i'm the founder of empower code helping you make a change with technology today marks the 12th episode of my course data science for media bias detection where i'll be teaching you how to create a fully functional python news alligator entirely from scratch but first let's start off with the components that we're going to use inside of our script number one a news extract script which uses newspaper 3k to access and store metadata from news articles number two a web scraping script that uses beautiful soup and requests to extract the latest technology related article urls directly from the new york times and finally number three a natural language processing script that uses text blog to extract the polarity and subjectivity of a given news article now if you're wondering how i created these scripts or if you just want to learn a little bit more python the links to all the relevant course videos can be found in the description below however if you want to take on a journey of your own and compose an aggregator of your own i have prepared a quick guide that can help you guys out if you are looking to build an aggregator of your own start by using a web scraping package like beautiful soup to extract a set of urls from reliable article section on a web page next you can go ahead and use a library like newspaper 3k to pull all relevant metadata from each article that you're able to extract finally for an added touch and extra challenge try adding an additional script which performs sentiment analysis natural language processing even try to create text processing models which analyze each news article but for our scenario we can simply combine the three scripts mentioned earlier and add in some extra commands and lines of code to quicken the pace of our script well without further ado let's get started after opening up pycharm i opened up my main.pie file where together we'll be finishing our source code for our newspaper scrape python project let's get started awesome now in order to get started i need to import each and every one of the files that you see on the sidebar menu so let me do that really quick and get back to you guys in a matter of seconds awesome as you guys can see i've imported all the individual scripts i need as well as the time module which we'll be using to perform some sleep commands in the middle of our code now that we have all of our materials up and ready let me quickly type up an introductory message for the user to read when he or she opens up our script alright guys that took a while but as you can see we were able to produce a fully functional introductory message for our script this states the purpose of the newspaper scrape project which is to scrape the latest articles in the technology section of the new york times with this out of the way we can shift our attention to the overall user experience in any program you always want to incorporate the user in some way this creates a fun engaged experience for all parties involved so here i simply assigned a variable name to an input prompt which says enter your name to get started with this field we can now refer back to it in multiple parts and portions of our program to make the user feel more involved and engaged inside of our program alright guys this is how the script works so far so we have our introductory message we discussed earlier and here if i simply enter my name as input it prints out welcome rohawk you will now see the latest technology articles in the new york times now our user is introduced and we can move on to some of the more detailed portions of our code now since this is a web scraping program we can also include some time dot sleep commands these sleeve commands delay run time by a certain amount of seconds as specified by you the programmer and they also add an element of professionalism to your code because they balance out the elements and make it easier for the user to understand what's going on so here we can simply say extracting article hyperlinks add three dots to indicate time usage then thereafter we can add time dot sleep two so our program will delay run time by two seconds after adding some more time dot sleep commands this is how our program functions [Music] as you saw the time.sleep commands come into effect and make the program a lot more easier to go through because you're able to see and read all of the individual steps that are happening within awesome now we can incorporate our first script news extract which pulls all the article urls we need so let me copy paste the code that we used in that script awesome now let's go over this line by line to see how it works exactly so the first line gets the content string that actually contains the html with all of our hyperlinks inside after this we get the start and ending indices of each and every hyperlink and assign the results to two lists start indices and end indices then with all our lists and content string we can finally get our url list by calling our get all urls method and passing in both of our lists and our content string as parameters so let's print out our url list to the console and see what we get so after going through the initial stages of our script we're able to get four or five urls directly pulled from the new york times technology section so that part of our script is up and running now if you want a more detailed explanation regarding how these methods actually came to be the relevant course video will be linked down in the description below feel free to check that one out now it's time to incorporate the remaining two scripts into our code so let's iterate over each url in our url list awesome so we've iterated over each url and we printed it out to the console followed by a string which reads article url now we can use the summarize article method from our new script script so here we can type in summarize article and pass in the url that we're iterating through what this summarize article method does is it pulls some really cool metadata from each article including the publish date of the article all the images inside and the author name for more information on this check out the relevant video in my course to learn more link in the description next below our article summary we can call our find sentiment function from our news nlp script so here we simply have find sentiment and we pass in the article summary as a parameter so what this find sentiment function does is it extracts all the average polarity and subjectivity sentiments from each article so polarity measures how positive or negative something is and subjectivity measures how biased something is finally in the last two lines of our for loop we can simply go ahead and print a line of dashes as well as a time.sleep command which will allow the user to read all of the information that's being presented to them alright guys now that we're done with our for loop let's add some closing messages to cap it all off so first let's print a blank line and then let's print the articles have been successfully extracted on a new print statement now in order to tell the user how many articles are actually extracted we can simply take the length of our url list and typecast it to a string that's pretty cool finally we can thank the user for participating in our program and also mention their name previously extracted from the user input now there's only one thing left to do run this program and see what 10 episodes of work has gotten us to let's do the final project run all right guys i run my program here we see our introductory message being printed out to the console let me go ahead and enter my name and it's extracting our hyperlinks achieving our summaries here we see our first complete article we have our url here at the top our author our publish date our top image url all the images inside of our article a quick article summary as well as a final analysis of our article scrolling down we see yet another analysis this is coding in action as you can see the articles are segmented by seven seconds this allows the user enough time to read all of the information being presented to them and as the articles keep popping up this is a perfect place to catch up on the latest technology related news from the new york times this is the culmination of our project and i have to say it's pretty cool one by one each article analysis section comes with all the metadata a quick summary and an overall sentiment for each finally to finish things off at the bottom of our code we see a closing message which summarizes everything that's just happened it says the articles have been successfully extracted and in total we were able to extract nine different articles this is the culmination of a combination of multiple python scripts and i have to say i'm really satisfied with how it's come out now it looks really professional and we're able to get a lot of information in just a couple minutes of python code this is what coding is all about after all it's all about having fun and exploring unknown domains and all along the way learning new concepts facts and tools i hope you guys have enjoyed the building of this amazing news alligator and i hope you've learned a thing or two as well back to the camera and there you have it by using python fundamentals we were able to create a fully functional news alligator which will update each and every day this is what code is about having fun and learning all along the way in the coming episodes we will be discussing the role of data science in harnessing social change and impact once again if you still have doubts or confused the script we wrote together is stored on github feel free to access the description down below to access the link once again thank you so much for watching this episode take care stay safe and i'll see you in the next one

Info

Channel: EmpowerCode

Views: 1,707

Rating: 4.8974357 out of 5

Keywords: python news aggregator, python news api, python news api example, python news scraping, python news, google news api python, data science news, data science for media bias detection, python tutorial, web scraping, python

Id: hp1Uw5vb6fE

Channel Id: undefined

Length: 11min 51sec (711 seconds)

Published: Sat Feb 20 2021