[Python Project] Sentiment Analysis and Visualization of Stock News

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys what's up my name is avi and welcome to a brand new project on the codecs platform this project is all about using sentiment analysis to understand financial news and make decisions on stocks every single day authors around the world post articles and fin this is a great platform to see all the articles people have posted about a stock what if instead of having to go through every headline and understand is as positive news or negative news we can use Python to parse fin viscom gather all these article titles and then use sentiment analysis to understand if every day averaging is this positive news or negative news this is a very exciting project that will use a whole bunch of different technologies we'll start off with beautiful soup on Python by scraping the website gathering the data and then using pandas to analyze and run our n ltk module for sentiment analysis and then last but not least matplotlib for visualization at the end we'll have this final graph where you can specify what tickers you want to visualize and we can see the correlation and sentiment analysis average daily of how that stock is performed based on the news this will be a great project to kind of hone in your skills on web scraping on data analysis of manipulation and last but not least visualization of the data hey guys what is up over here and welcome back to the Codex in this video we're continuing our new project on sentiment analysis of stocks from financial news and in this video we're gonna be using beautifulsoup to parse finn vez an online financial data company and get the headlines of articles based on tickers of our choosing so what is that actually entail our goal right now is to accumulate as many headlines as we can and then after that run sentiment analysis on the text of those headlines Finley's has headlines of every relevant ticker that we can search for over here for example on finn viscom i can go ahead and type in amazon i can open up the ticker and this has a bunch of numerical data about their market cap income all that good stuff but then down below I can actually see if i zoom in here I can see the articles that is the most relevant to me so these articles are again in the HTML source code and we can use beautifulsoup here to parse that code and get these articles of our choosing so if I go ahead and view page source and now if I search for AWS and formula 1 this corresponds to this text corresponds to this link right over here and so our goal in this video is to parse this HTML code and get the relevant titles and the timestamps in order to parse and apply the sentiment analysis on so without further ado let's jump right in first thing first we're gonna go ahead and open up a brand new project and pycharm again if you're using a different editor feel free to use anything of your choosing I use pycharm because it's something that I've been using for multiple years and over here I have a folder already created I'm gonna create a new Python file called main view I now inside of this file for now I need two modules I need the request module and I also need my beautiful soup module so we're gonna go ahead and import two things from URL Lib dot request import URL open comma request and then second of all we're gonna go ahead and import from bs4 import beautifulsoup now I'm sure one of these modules you might have to install and if that's the case head over to your terminal and inside terminal you're gonna go ahead and type in over here Python 3 that is the version of Python I'm using you might be using Python 2 in that case type Python or just Python 2 and then go ahead and say - M pip install BS 4 so what does that command look like it's entire e we can see python 3 - m pip install BS 4 and that will go ahead and install a beautiful soup for on your machine if that works then you should no longer see an import error when you say from BS 4 import beautiful soup the next thing is our URL so what is the URL of our choosing well as you can see over here when we passed in the ticker name for example Amazon it goes ahead and passes in the ticker as a parameter in the URL itself so simply all we have to do actually is take this URL of HTTP fin viscom slash quote a sh X question mark T equals and then we can pass in any sort of ticker symbol to get those results so in this case we do Amazon that's Emma's on otherwise if I do something like AMD for example I can get the relevant AMD data etc so I'm gonna go back over here and type a new variable I'll call this fin V's URL for now and inside of this just paste your the fin miss URL that we just copied now in this case it might differ a little bit you have a different variable name that's totally fine but this is the raw kind of URL that we're looking for and now let's go ahead and come up with a list of tickers that we want to parse in this project so I'm gonna go ahead and create a list of common tickers that I frequently check including Amazon including AMD let's go ahead and add some more common ones like Facebook and I think for now that's fine so Amazon AMD and Facebook these are three fantastic companies that you can look at the stock data and get the financial news from so here I'm gonna go ahead and now iterate over each of these tickers and create the fin viz URL that we're gonna go ahead and parse so four ticker in tickers I'm gonna go ahead and say right over here my URL that I'm gonna parse my actual kind of call here is going to be equal to my fin biz URL plus whatever tickers chosen so here you can imagine my ticker Amazon AMD and Facebook will just be appended on and that will complete my Finn visit URL that I want to go ahead and get the HTML data from awesome now that we have our URL that is completed with the Phineas URL plus our ticker our next step is to go ahead and request the data from this URL now we import it at the top URL open and request and these are the two modules that we're going to go ahead and use to open up this completed URL and request the HTML data so go ahead and create a request variable and inside of this specify you're using the request that we imported from above you're going to specify the URL parameter which is equal to the URL we just created and then after that you can actually specify a header and this header will allow us to access this data if you don't specify a user agent then we will be forbidden access and we won't actually be able to download the data from our thin viscom website so in this user agent you can specify your application to be whatever name it is but you're essentially just verifying that hey I'm accessing it accessing this website from this particular user agent in this case it can just be my app so that is my request and now I'm gonna go ahead and open this request by saying response is equal to URL open and then pass in this request so now I'm gonna go ahead and just print out the response and see what happens we're not gonna see anything exciting but we'll go ahead and just see a request set of object so we have an HTTP client HTTP response object and this is something that beautifulsoup can now take in and parse the HTML content out of so imagine that we now have the response of this entire website with the ticker being passed in and the reason why I added this break is so that I don't go through all the tickers since we're only testing this code as of right now and what I'm gonna go ahead and do now is take this response and throw it into a beautiful soup so I'm gonna say HTML my actual HTML code here is equal to beautiful soup pass in the response HTTP response we just got and then mention that it's an HTML parser that we're using that's all we have to do and now if I print out the HTML code we're gonna see exactly the source code that we want to go ahead and get from this website so this view source code that you see over here we now see in the output of the terminal if I go ahead and zoom out just little if you can kind of visualize that it's a lot of different code and a lot of different things are going on but I hope you get the gist we're now parsing the HTML code from this URL bin viscom slash quote where the parameter is now Amazon MD or Facebook now I'm going to go ahead and go back to this sort of sort of page and search for AWS and formula one and see where this data actually lies and if we take a look over here we notice that all of these articles lie in this table and let's go ahead and kind of look at the HTML code to figure out what we can use we can see that a table holds all the data and that table has an ID of news table so we can do is we can take all of these news articles parse the table by ID so news table ID and then just add this data right now to a dictionary that we can parse down the road so let's go ahead and do just that I'm gonna go ahead and create a dictionary above this for loop and call it news data and inside of this I'm gonna go ahead and actually you know what let's use a better word since these are tables let's just go and call them tables news tables is an empty bracket kind of dictionary and then right after that we're gonna go ahead and use some beautiful soup syntax in order to parse this HTML and get the ID news table of visit to our HTML object so over here I'm looking for the news table ID I'm gonna go ahead and say my news table is equal to HTML defined and here I want to specify my news table ID again the exact same ID we found in the source code go ahead and specify news table and so what this is gonna do is it's gonna get us the HTML object of this entire table so Howard far down this extends like a huge gigantic table with all the different results this is exactly what we want we're storing this right now in terms of our news table of dictionary so we're gonna say over here our news tables dictionary has a key that key is gonna be the ticker value and that should be equal to the news table that we just added so again all we're doing right now is we're taking this table object and storing it in addiction yes it's a little bit inefficient we could definitely parse through every single table right now itself but kind of to simplify the code and make it easier to understand let's save these tables to a dictionary and then parse each table individually so now that we've done that let's go ahead and print out our news tables and see what's up so I'm gonna go ahead and run this code I can see a small wording over here which is just saying a beautiful suit message and now this dictionary holds just the table of my results from the webpage and so all these TR links again TR HTML stands for table row all these table rows correspond to all the table rows that we see over here each one of these is a table row and we can now parse the table row to get the text of each of these sites hey guys over here and welcome back to the Codex in this video we're continuing our sentiment analysis of stocks from financial news and in this video we're gonna go ahead and parse the data that we've gotten from beautifulsoup and get it into an understandable format that we can extract the title the timestamp of these news articles and then eventually apply our sentiment analysis on so the goal of this video is very simple to iterate over all these table rows that we can see in our data set and get the values of not just the timestamp but also if I can scroll over here let me go ahead and move this a bit faster we have the href of the link but then also the text of every single article that's in the table so let's go ahead and first understand how we can do that with a simple base case and then after that we'll go ahead and expand to a more globalized function so right now I'm gonna go ahead and access my first kind of index which is news tables Amazon as a play data set so I'm gonna go ahead and say over here my Amazon data is gonna be equal to my news tables Amazon and now I'm gonna go ahead and find all of the table rows that are relevant in this table HTML object so there's a very cool find all function that beautifulsoup has and what I can go ahead and do is say that my amazon rose is equal to my amazon beta dot find all and then here i'm just passing in the HTML tag TR and so what's its gonna do is give me a list of all the different TR elements inside of the HTML object I just passed it and in this case I passed in the table of all the relevant news articles so what I can go ahead and do now is print out my Amazon roast just to give you a quick rundown of what's going on and what's happening over here so if I go ahead and control shift art it's gonna go ahead and parse the website and now I have a list like object where I have all my TRS in a comma like faction and I can now go ahead and iterate over these rows to get the values so for a quick demonstration of that and feel free to just follow along you don't have to copy this code this is more optional just to understand the data of the rows so we have all the table rows of our table and now what I'm going to go ahead and do is iterate over these table rows so for index comma row in enumerate Amazon rows so again the enumerate function gives me the index and the object of any list item or I have any list and iterates through every single object I'm gonna go ahead and get the text of what what exactly am I looking for here so I noticed that there is an a tag inside of my TR and if I scroll to the right I can see that inside of that a tag is the actual text of my article so I can go ahead and say over here that title is going to be equal to my table row so in this case row and then I'm gonna access the a attribute or the a tag inside of that and then I'm gonna get the text of my anchor tag so all I'm saying here is that look for the anchor tag inside of my table row and this anchor tag that I can see right over here go ahead and give me the text that's inside and so that should give me this text right over here and then I can go ahead and take a look at what this does so let me go ahead and just print out the title for now just to show a quick example of what's happening so I'm going to go ahead and parse the HTML code and look at that I'm now able to get the text to the text of every single article of the table row inside of my massive table now let's go back and see how we can get kind of date or the timestamp over here so I'm gonna go back to the HTML code and let's look take a look at what's happening and I can notice immediately that kind of like how we had our anchor tag and inside of the anchor tag we had the text for the article I'm noticing that we have this TD tag and inside of the TD tag I'm seeing the actual timestamp and so what I can go ahead and do now is search for the TD element and look for the text inside of the TD element to get the timestamp of our data so that seems very straightforward just like we have title I can say timestamp my timestamp should be equal to Rho TD dot txt so again now we're looking for the TD element in getting the text of that and now for simplicity sake so I'll just go ahead and say print timestamp plus and then a space and then title so let's go ahead and run this and let's see what we get here is our code and awesome I can go ahead and see immediately that I had the timestamps of when every article was published I also have development date for when it's the start of a new day so the next step is to go ahead and take this code and modularize it so it works for any ticker inside of your data set right now this data is very test focused it's more of just testing out Amazon and making sure that it did the data works so let's go ahead and iterate through the news of all of these kind of news tickers so we're gonna populate and then let's go ahead and scrape the data and add it to a new list of items so let's go ahead and run that code right over here I'm gonna create a new data structure called parsed data and this is gonna be a list object and my goal right now is to just create multiple kind of Lists list of Lists or just like lists inside of this data set that correspond to my ticker my date my time and the title of my article so now we're not too worried about storing them based on ticker we're just a store all of the data we can get on all of our different tickers inside of just one massive array so let's go ahead and iterate over our dictionary our dictionary over here was called news tables so for a file or for ticker comma News table in my news tables items so I'm iterating over the key and value pairs for every single key value in my news tables dictionary what I'm gonna go ahead and do is kind of copy the same code we had above when we were doing test it which was for every single kind of rows for all the rows in news table dot find all and we're finding all here the TRS right for all the rows that we can find what are we doing we're trying to scrape multiple things here the first thing was can we scrape our title so we know that the title of our article is equal to whatever row that we get sorry I should actually make this singular because we're iterating over every single row so title here should be equal to Rho dot the anchor tag get text right so something very straightforward and then the next thing is to go ahead and get our kind of timestamp so the timestamp here could either be of two formats it could be of a single time step like this or a timestamp with a date so let's go ahead and implement both cases we're gonna go ahead and say that our date is gonna be equal to Rho TD dot text dot split and what this is gonna go ahead and do is split our text based on in this case a space so over here we're saying that split this into sections and we're gonna go ahead and see that if the length of this is only one then we know for a fact that it's just a single time step but if our length of this date is greater than one then we know for a fact that there's multiple values here the first value is going to be our date and the second value is gonna be our time so we can go ahead and do that right over here and you might notice I use get text instead of text they're essentially the same thing I can go ahead and say text and we get the same results but over here I'm gonna say that if the length of my date is equal to 1 this implies that we only have the timestamp then I'm gonna go ahead and say that the time data is going to be equal to my date and I'm gonna go ahead and say the first so we're going we're splitting by the space we have split the date and we're saying that the first item so the first item over here should be my time and otherwise if we have more than one component something like this June 1829 28 p.m. then we have two components we have the date and the time and so I'm gonna go ahead and say over here that my date should be equal to my actually I'm gonna reword this I'm gonna call this date data that way we can distinguish between date and the actual date data and I'm gonna say my date data is gonna be zero and my time is gonna be date data one so hopefully that makes sense all we're doing is we're getting this text right over here and we're splitting it up based on the spaces if there are two components or if there's essentially just not one component that we know we're dealing with two components here the first one being the date the second one being a time otherwise if the length is equal to one then I know we're only dealing with one component and that is this timestamp which is time date data zero so now that we have our title or time and our date the last thing is our ticker and we have that right over here from iterating over through the items and so all I'm gonna say is for each one of these rows we're gonna go ahead and say parse data parsed it up dot append and we're appending inside of this ticker the date the time and the text all right so again all we're doing is we're saying sorry I got the text the title and we're saying that if the date exists then we'll go ahead and add that day in the time in title and etc and if it doesn't no worries and over here I go to the small issue date not defined because I need to replace the date data with that right over there go ahead and save this and at the very bottom print parse data and let's see what we get so let's run this code and I see our ticker and let's take a look so we have right here our first value Amazon the date June 23rd 20th timestamp 409 with some external kind of I believe the end nbsp that we can go ahead and remove down the road that's fine and that was our first article let's take a look the second one Amazon June 23rd one-thirty Amazon June 23rd 12:46 here's a title fantastic so what we've done in this video guys is we've taken our data set the large conglomerate of tables and all these table rows and we've parsed through that and found exactly what we needed the ticker the date the time and the text hey guys over here and welcome back to the Codex in this video we're continuing our project sentiment analysis of stocks from financial news and in this video we're gonna apply sentiment analysis on the titles that we got in the last video so right now we have a list of lists and each list consists of a ticker a date a time as well as the title of the article and our goal right now is to take every single one of these titles and apply sentiment analysis so in order to do this we're gonna look into something called Vader which is part of an old TK corpus again a lot of buzzwords but let me break that down for you so go ahead and search NL TK Vader sentiment and this is an online package that's been trained on a bunch of online articles there's like a whole kind of documentation on what it does but essentially we can use NL TK's type of corpus for Bator to apply sentiment analysis on code so what does that mean for us let me go ahead and pull up this link to show you an example of how this works I'm gonna go ahead and kind of just like read off this for just a second and give an idea of what's going on so you import this kind of function sentiment intensity analyzer and then all you have to do is take some message text for example I am getting frustrated with this process I'm generally trying to be reasonable you apply polarity scores on top of this message text and then you're given out kind of positive negative scores so the result that we get from applying this function on the text is compounded negative 0.3 negative point zero nine three neutral point 836 positive point zero seven one so what does this result mean well the negative neutral and positive values describe the fraction of weighted scores that fall into each category so we have like it's a part negative part neutral part positive but the end of the day we care is it negative is it positive or is it somewhere in the middle and that's where the compound part of the result comes into play the compound portion is a normalized value between negative 1 and 1 and this value attempts to describe the overall effect of the entire text from strongly negative which is negative 1 to very positive which is positive 1 so this negative point three signals that this text is moderately negative and that kind of makes sense you see words like frustrated he's trying to be reasonable I'm not trying to hold up this seems like kind of a semi negative text and so our goal is to apply this function from NLT k's module on our title so we've parsed over here so at the very top of your code we're gonna import a couple of things the first thing that we're gonna import is a null TK and what that means is you have to install that so head back to terminal and inside of terminal go ahead and type Python 3 - M pick install and then NLT Kay okay and so it's going to go ahead and install n on CK on your machine and then in Python console this is something unique again if you don't have Python console open and your terminal go ahead and just type Python 3 you're gonna do two things you're gonna say import n l TK and that will go ahead and import the corpus and then you're gonna say n l TK dot download okay and inside of download you can specify if you want to download the beta or lexicon package or you can use the nice graphical interface that download comes with so it goes ahead and creates this pop-up and this pop-up allows us to see what packages NL TK has and what we can use and so we're going to use the sentiment analysis package called Vader so if I go ahead and go to all packages and essentially scroll down until I see V I am looking for Vader and that is this package right over here Vader underscore lexicon so go ahead and in double-click on that and install or download and that should go ahead and install it on your machine and that's that now again if you don't see that graphical interface pop up or you run into errors you can also just say NL ticket download and then pass in in quotation marks your Vader underscore lexicon so again both work and that should install the package on your machine now at the very top of main dot py you're going to go ahead and import this package so you're gonna say from NLT K dot sentiment Vader we're gonna import sentiment intensity analyzer and so this will allow us to call the function we just saw in the article and apply sentiment analysis on any given text and last but not least at the very top I'm gonna say import pandas as PD now pandas is a module that allows us to manipulate data in a nice table like data structure it's very popular in data science applications and the reason why we're using this is because pandas will allow us to very quickly apply functions for example in this case a sentiment analyzer on to our data set and so go ahead and say import Pettus SPD we have our imports ready scroll all the way down and now let's go ahead and start implementing our code so the first thing first is we're gonna create a data frame to host our data in we have this array of array collection of data structure it's not the best it's not very efficient so we're gonna go ahead and do is create our pandas dataframe DF and set this equal to PD dot data frame pass in our kind of parsed data that we have already so a list of Lists and we're gonna specify a columns field so the columns are gonna be the headers of our table imagine this to be an Excel spreadsheet it's our data frame we have our ticker our date our time our title but pennis doesn't know what the actual columns are called so go ahead and inside of this pass in an array for each one ticker date time and then title so these are our four column names for our pandas dataframe now we can go ahead and for now just print out our head print DF da head and this will print out the first five rows of our pants so go ahead and run this for now let me go ahead and show you the results over here and there we go so it's kind of condensed and honestly if we were using Jupiter notebook or a better visualizer this would look a lot cleaner but for now I just want you to understand that we have this type of like row by column structure where we have rose there's tickers there is two other columns in between over here and then the title so that's essentially what I want to show you is now we have this in the data frame esque object now the next step is to go ahead and initialize our Vader and Vader here is the sentiment intensity analyzer so go ahead and specify here Vader should be equal to sentiment intensity analyzer and call that function so Vader will be used to analyze any given text how would this work or how what's an example of this let's go ahead and just test this out so I'm gonna say Bator so print out Vader dot polarities course and now pass in any text I don't think Apple is a good company I think they will do poorly this quarter so this is again some sample text I don't obviously believe this but go ahead and just run this and let's see what happens I wanna give you an example of the Vader polarity scores so as soon as that ran we can see over here that negative zero neutral point a positive 0.195 interesting so the module isn't perfect but it does give us a good unbiased estimate of what the text is in terms of positive words or negative words and I guess in this case it probably picked up the word good and counter that with poorly and must have come up with a compound score of 0.4 let's go ahead and do throw in some negative words I think let's go ahead like rephrase this I think Apple is a bad company and they and they will fail sales this quarter I don't know something a little bit more negative and let's run this and let's see what happens so again running the same text on the scores and now we see a little bit more negative negative 0.40 one to neutral point five eight eight but the compound score if you're really care about is negative point seven nine so that again looking for the bad negative correlated words bad fail etc and again it's not perfect but it gives us a very good idea of what the sentiment is and I guarantee you this is a nice restraint standard so it's very popular used by a lot of professionals and scientists and I would say here that for the case of this project it fits our case perfectly so let's go ahead and apply polarity scores on what on the title title column of our data frame so if I go ahead and print out DF title I can access all the titles of my data frame and see over here the 99 plus different titles that exist and so what I'm gonna go ahead and do is enter in some cool pandas magic and I'm gonna say over here that my DF create a new column I'm gonna call this column my score or for now let's call it compound and this compound score is gonna be equal to my current data frame I'm gonna apply this Vader function the Vader dot puller polarity score function on the title column of my data frame so we're doing this I'm gonna go and type it out here I'm gonna take every single title and I'm gonna apply a function and the function I'm gonna apply is gonna be a lambda function so if it's a little bit easier let me go ahead and explain separately with the lambda function is gonna be I'm gonna call this limit a function f and this lambda function is gonna take in a title the title is gonna be X or for simplicity's case I'll say title and it's gonna say give me the polarity score of this title except I don't care about all the values I don't care a positive negative neutral the only value I care about and that dictionary object is gonna be the compound score so go ahead and say that okay for whatever string I pass in to this function if I call F on any string asdasd I want you to just give me back the compound score and so now I can apply this function f on my DF title so for every single title that exists I can go ahead and apply this function on it and without go ahead and do is create a brand new column inside of my date frame called compound with just the compound scores of the sentiment analysis of every single one of my titles so now if I go ahead and print this out let's go ahead and say print D F dot head and let's see what happens so over here I have my terminal and would you look at that now we have five columns and I'm sorry that you can't see the ones in the middle but just take a look at the last column we just added we're now able to specify with the beauty of pendous magic plus some neat lambda function that every single title now has the corresponding compound score associated with it in the column right next to it so what we've essentially done is we've taken our data applied the polarity score the sentiment analyzer on each one of the titles and added it as a column in our data frame hey guys over here and welcome back to the Codex in this video we're wrapping up our sentiment analysis of stocks from financial news project and in this video we're gonna go ahead and take a look at visualizing the data that we have parsed and analyzed so far so we've done a lot we've gotten the data from VIN base using beautifulsoup we've parsed that data to get the relevant title and information we have applied sentiment analysis to figure out how positive or negative the titles are and now let's visualize a trend over time to see how this company or how the companies we've chosen have been faring so first things first I want to go ahead and convert the date from a normal string to a recognizable date-time format so we have a hierarchy of what date came when so let's go ahead and do that right over here I'm gonna say DF and then pass in the date column so we're modifying the date column to be and then say over here PG dot to date time and we're gonna go ahead and specify which column we're making the day time we're saying the DF dot date column and essentially all we have to do is just say dot DT dot date and what this will go ahead and do is take our kind of date column and convert it from string to the date-time format okay and that way we visualize this in matplotlib and draw the chart we'll make sure that the months of May June whenever this video has been recorded the month of May and June they'll come in the correct order and the date/time object Penniston map pol lib automatically recognized now the next step is to go ahead and import matplotlib to visualize this data so at the very top I'm going to go ahead and say import matplotlib Bob PI plot as PL T now again if you don't have map pup installed go ahead and install it in your terminal so you're gonna go ahead and whoops let's go ahead and exit that exits there we go and over here we're gonna go ahead and say just one thing Python 3 - M pip install matplotlib so again if you don't have this packet installed make sure you go ahead and saw it right now and that should go ahead and install that on your machine once that's done and we've imported matplotlib dot pi plot as PLT the next step is to go ahead and set our figure size so the figure that we're gonna be making tonight for now we can set it to be equal to big size 10 comma 8 this can change down the road and now our goal is to take all the compound scores and get the average of that so what does that mean I have all these articles over here for example I'm looking at Amazon data June 22nd I have 5 10 15 20 all these different articles and each one of these articles are gonna have each one of these articles is gonna have a compound score associated with it that compound score is a centum analysis ranging from negative 1 to positive 1 I want to average all the scores of all the news articles every single day to figure out was today a positive day for Amazon or was today a negative day or maybe it was neutral so we're gonna use some cool pens techniques to analyze our data frame get that going and then visualize that and plot the data out first thing first I'm going to call this a new mean DF and we're gonna go ahead and group our data by ticker so again Amazon MD Google etc and then the date and so by grouping it like such I'll have the Amazon date specific all these collected together and then I can essentially just calculate the mean and what this will go ahead and is look for integer values and the only integer values you're dealing with right now are the compound column values and so the mean function when applied on this group by will go ahead and return me a data frame where I have the column ticker the column date and then all of the compound values mean or averaged so that we have one value for every ticker date specified so that's the power of Group by by specifying me if I go ahead and just print this out print mean DF let's go ahead and see what that looks like just for visualization purposes and I think in hindsight I should have used a Jupiter notebook to visualize this data since I'm doing so much work with pendous but in future projects I'll definitely implement that in this case I have a ticker I have the date and now you can see that I only have one date entry since we use a group by function over here and by calculating the mean I have the average sentiment of what the average article kind of said was it a positive was it a negative was it kind of neutral and for the most part amazon has been relatively positive up until the past couple of days so what we can go ahead and do now is modify this data frame so that it will follow kind of the chart like data structure that we need to visualize this the first thing is we're going to go ahead and unstack this data and unstack will allow us to have kind of the dates as the x-axis and so we're gonna say over here mean DF is equal to mean DF not unstack and then right after that we're gonna get rid of the compound column and when I say get rid of that I'm essentially just removing the name compound and so I'm gonna go ahead and say over here mean DF is equal to mean DF dot XS I'm going to take a cross-section the cross-section is going to be of compound and then I'm gonna go ahead and specify my access to be columns and then transpose this data so this is doing a lot of kind of nifty pendous techniques but essentially all I'm doing is manipulating this data frame over here transposing it and ensuring that I get exactly key value pairs being the dates and the compound values for every single ticker so if I were to out what this means IDF looks like after applying these two functions let's take a look at that and see if this data kind of represents what we want so in this case I now have my ticker and I have Amazon and now I have my dates as my first column and I have the values as my second column so that's the power of taking the cross section if I don't take the cross section and I simply just transpose let's take a look at what that looks like and you'll see that I'll have an extraneous compound column that I don't need this compound column exists because of the group by and unstacking and by simply taking the cross section I can get rid of this extra type of label that's been added on my data set so now that I have this mean DF and it seems to be looking good the last thing I have to do is plot this so I'm going to go ahead and say mean DF dot plot and specify over here that it's gonna be a bar chart and now the last line for this project PLT dot show so this will go ahead and render my entire plot on my screen and I think in pycharm will show it kind of nicely on the right-hand side but again if there's one thing that you forgot it's the very top I had a break statement to ensure that I don't go through all my different tickers to save time and efficiency but go ahead and remove the break statement in case you had that now go ahead and run this code and let's visualize our entire project end to end we analyze the fitness data we run sentiment analysis and now and hopefully a nice rendered bar chart we can see this data over time so over here let's go ahead and take a look at what's going on I'm gonna zoom it here I can see my bar charts going from negative point 2 to point 3 the blue lines represent AMD oranges Amazon FB is green now I guess Amazon or I guess Andy doesn't have a lot of articles and that's why I have so many data points instead I'm gonna go ahead and do is remove my AMD ticker and replace this with something that I believe is a bit more popular so let's go ahead and try Google and see how this dataset looks so I'm going to go ahead and run this one more time so that our data set is a bit more cleaner all right so this definitely looks up much cleaner than before and let's break this down so I have my three tickers Amazon Facebook and Google and every single day I can see if articles are published what the sentiment of these articles worth the average sentiment of these articles so going by recency I can see that these dates are now in the correct timestamp format because we converted the dates to the date-time format and in the most recent date there was some negative news for Amazon positive for Google yesterday was a very positive day for Amazon and we can see a general trend of negative news for Facebook and hopefully that corroborates with the data that we see if we read these articles and take a look at what's happening are these negative or positive articles but that was our project end to end we started off with simply a data set not even a data set our URL our goal was to take these fitness articles analyze them and figure out everyday is my stock having positive news associated with it or negative and that's exactly what we did to break down what we've accomplished I think it's pretty impressive we took our article we iterated over the tickers and got the HTML data we parse this HTML data into a format that was nice and readable applied it to a panda's data frame and calculated the sentiment analysis using Bader and last but not least we visualized this data set in a bar chart like format to see how our stocks have progressed over the course of the past week and see the relevant kind of highlights of the positive and negative correlation of sentiment analysis I hope this is an interesting project for you I know I've had a lot of fun building this there was a ton of cool technologies that we played around with and this is definitely a script that you can run every day perhaps and see how your stocks are faring and make trades based on this information thanks so much for listening guys I hope you've enjoyed this project and I hope to see you in a future project down the road on the Codex platform thanks so much [Music]
Info
Channel: TheCodex
Views: 23,611
Rating: 4.9805827 out of 5
Keywords:
Id: o-zM8onpQZY
Channel Id: undefined
Length: 43min 31sec (2611 seconds)
Published: Tue Jul 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.