Tweet Visualization and Sentiment Analysis in Python - Full Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you enjoy content like this please subscribe to the lucid programming channel for more programming tutorials okay so in this video we'll be making use of the twe P API and this is an API that we can access via Python which will allow us to very easily stream tweets in real time directly from Twitter so in order to follow this series of tutorials what we're going to be doing is installing tweet P which is a Python package so I'm going to assume that you have Python installed on your machine and if so all we'll need is the twe P module we're just going to create a Python sorry well we are going to create some Python we're going to first create a Twitter application and in order to do that you will also require a Twitter account so if you don't have a Twitter account go ahead and create one it's pretty easy to do doesn't need to be a main Twitter account or anything like that you just need to sign up for Twitter in whatever way you wish to do so once you do that you will have the ability to create a Twitter application and then well write Python code that will interface with that application to stream tweets in real time based on keywords or hashtags now what we'll do in this video is we'll just write a very simple application to accomplish just that so we'll see tweets given a set of keywords and then in later subsequent videos we'll see what we can do to analyze the content of the tweets that we get back maybe we can produce some nice graphs based on that data and then some other fun things that we can do with that data like possibly sentiment analysis which will most likely be part of the natural language processing series of videos so keep an eye out for that as well so I guess one thing that I want to mention as well is that all of the code as usual for all of these things will be provided on the github page and all the links that I'm talking about here all the links that I have opened on this browser are going to be available below in the description so no need to write them down but just refer to that and you can see the code that we use in this in all videos another thing I want to mention too is that if you have any problems or questions about twee pea which is the primary module that we will be using in this tutorial series the documentation is Doc's tweed org again link will be in the description and more further information about this module is there and it's very descriptive and well written so I will refer you to that so before we get into writing any Python code as I mentioned we need to actually go ahead and create a Twitter application so in order to do that what we need to do is navigate over to apps Twitter comm again this will require that you do have a Twitter account and are logged in if so you should be presented with a dashboard that looks probably similar to this depending on what time when you're watching this video you can see that I already have an application here listen to under my Twitter apps that was used for something else I was working on just for fun and what were your going to want to do you might not see anything here if you don't have any apps in this application management section you can click on this create new app button that will take you to a form which just asks for some really simple details about what this Twitter application is going to be doing so I've already kind of did a dry run of this and I have some information for what will be filling in so you can put in whatever name you like I'm going to put in this name this is just the name of the Twitter application lucid programming Twitter app a description which is also required so I'm just going to say a test Twitter application for a YouTube tutorial and then a website so I'm just going to put in my own homepage here just for good measure I suppose callback URL this is not particularly necessary so we'll just leave this blank make sure you click this developer agreement and then you can click on create your Twitter application so if you do so then you are presented hopefully with this page that says your application has been created and all of the relevant information for your application will be stored here in terms of how to access I guess credentials and things like that so what you're going to want to do is actually click on the Settings tab right here so if you do so you will see a you'll see four things listed consumer key consumer secret access token and access secret I'm not going to click on that because the I guess the keys are unique to each program and I guess showing those things is I think it's against well it might be in poor taste I don't exactly know if there's any like security breach if I share those things but I guess just for good measure I'm just going to let you click on let yourself look at what those are keep them secret and then we're going to use those to interface with our application I guess for one thing if you do have a Twitter application that's interfacing with Twitter you wouldn't want somebody to get access to your tokens because then they can manipulate it and they could you know possibly abuse it and you know that wouldn't look good for your Twitter account so try not to let that happen so we're going to use those four credentials to authenticate the Python program that we will now be writing so with that said convey devise this and let's get to writing some code so the first thing I'm going to do is I'm going to create a credentials file which is going to store those four things which you should see on your settings tab so I'm going to create this as Twitter credentials dot pi so what we're going to do here is we're just going to create some variables which will be accessed in our primary program so I'll just put in a comment variables that contain the user credentials to access Twitter API so the first one is access token the next is access token secret there's also the consumer let me spell it properly consumer key and the consumer secrets so what I'm going to do is I'm just going to whoops I'm going to define each of these is just Python strings so I'm just going to assume that you can go over to the Settings tab back to here click on the Settings tab and then copy and paste in between these quotes the access token access token secret consumer key and consumer secret save this file and make sure that this file is saved in the same directory that you'll be using all of your that you'll be writing all of your Twitter API access Python code in so I'm just going to save this right now and that's that so if I look here I have Twitter credentials pi in this directory in the same directory I'm going to create another Python file which will be the brains of what we'll be doing which as I mentioned before will just be in this video at least streaming the tweets directly from Twitter in a real time so let's create a file which we'll call twe P streamer PI so before we get into writing this one thing that I probably should have mentioned earlier than I did is that oops that was not supposed to go there let's see let's try that again there we go you need the tweet be module of course so as I mentioned I'm going to assume that you have Python installed I won't assume that you have tweet be installed so if you don't have it installed what you'll need to do is access it through pip pip install tweet pea if you run this command in your terminal it will go ahead and install for me I already have it installed so it says requirement already satisfied so sorry about that I've forgot to mention that earlier so back to the code go back to here let's go ahead and start importing some of the things that we'll be making use of that are coming from the twee' pea library so the first thing that well import is thuy be streaming import stream listener this is a class that's from the Tweety module that will allow us to listen to the tweets kind of the firehose of tweets as they as they come based on certain keywords or hashtags we'll also need to import another thing for authentication so we'll say from TP imports ooofff handler this class is going to be responsible for authenticating based on the credentials that we store it in the other file for associated with the Twitter app and then we're going to have one more thing from tweet P imports stream and I guess another important that we required out from tweet P is the credentials file that we created so imports Twitter credentials and again this should be located in the same folder as this file that we're ready to currently so the first thing that I want to do is I just want to create a class which will allow us to print the tweets so a very simple thing and then we'll go back over it and kind of make it a little bit more robust and we'll refine what it's doing but simple first so let's create a class and we'll call this standard out listener and this class is going to actually inherit from stream listener so it's going to inherit from the stream listener class and the stream listener class provides methods that we can directly override so one of the methods is called on underscore data class method that takes in a parameter data and there's another one that we can also override called on error which also is cost method and takes a status variable so what are these functions do what are they responsible for on data is an overin method which will take in the data that is streamed in from the stream listener so the one that is listening for tweets and then it's going to print we can do whatever we want with that data that tweet so what we're gonna do just to make it very simple is we're going to print out the data that we get and then we're just going to return true to make sure that everything went well we will go back and make that a bit more robust because that's a very simplistic at the moment on error this is what a method that we are overriding from the string listener class that happens if there's an error that occurs and one thing we can do again which is just very simple is we can just print out the error which is passed in through this status variable we're just printing it on it on-screen so if we encounter an error what will happen is this method will be triggered and will print the status message of that error to the screen so the next thing that we're going to want to do is just actually create a method sorry an object from this standard out listener class that we just created and then actually get on to streaming the tweets so this will be in the main part of the program so we'll say if name is equal to made and what will first do is create a listener object so I want to say listener is equal to standard out listener like that so again this is just an object of the class that we just created which is inheriting from the stream listener class and then what we're going to do is we're going to want to authenticate so we're going to want to authenticate using the credentials that we had stored in the other file so we're going to create this variable called auth and we're going to say o auth handler which is the class that we're importing from TP which is going to be responsible for actually authenticating to our code and we're going to pass it in the credentials so we're gonna say Twitter credentials dot consumer key and Twitter credentials dot consumer secret so this in order for us to define this auth object of the auth handler class it takes these two arguments that we need to pass in and then in order for us to complete the authentication processes we're going to say off that set access token and this is a method which is provided from the OAuth handler class and what this takes this method also takes two arguments and that takes the access token and the access token secret at this point our application hopefully should be properly authenticated so we'll create a Twitter stream based on the so most we'll call this will create a variable called stream equal to this Lane which is the class stream that we imparted above and we're gonna pass the two things the authentication token to verify that we've actually authenticated properly and then the listener object that we created and the listener object is just responsible for how do I deal with the data the tweets and how do I deal with the error if I encounter an error one final thing that we can do is we can filter the tweets because otherwise if we just run this stream if we run this listener it's just going to stream a ton of tweets at us which might be some of them might be interesting some of them might not be let's say we want to stream tweets that are focused on some keywords or hash tags or something like that so what we can do is we could say stream dots filter this is a method that is also provided by the stream class and that what this takes is a list one of the things that can take is an optional parameter of a list which is called track and in this track list we can provide it a list of things which if the Twitter sorry if the tweet contains any of these lists objects then it will apply it and it will say I'll add this to the stream so let's see let's just add Donald Trump let's Hillary Clinton and Barack Obama and let's say Bernie Sanders so just some politicians in there so we'll filter tweets based on this list of keywords so we can go ahead and run write this to a file and then we can go ahead and try to see what happens when we run it so python tweet be streamer now if we run this we get so let's see a 401 let's check why that happens to be I think the reason for that was because I didn't fill in any of the consumer key consumer secret or anything like that I just written it from what you saw before all those things were completely blank so don't be me don't be an idiot go ahead and make sure you fill that stuff in and once you do if you do that hopefully that should run and things should start coming down the screen there so like we see a ton of looks seemingly garbage just showing up on the screen here these are tweets specifically these are let me just kind of stop this here so if you don't stop it it'll just keep you know just keep streaming so this whole blob here this is one tweet it's not the person didn't tweet this thing out of course what this is is it's a JSON formatted dictionary object and each of the fields in this braced thing here contains information about the tweet so this says the tweet was created today this is today this is the you know ID that Twitter associates with this particular tweet apparently it was a retweet by this person here here is the actual text so Donald Trump that's one of the keywords that we had in the in the list check so 2019 state of the unit speech is January thirtieth TV ratings so yeah there's just some stuff that he's saying there what what other content do we have on this thing that's interesting let's see so it tells you also about like who retweeted it how many times been retweeted there's a whole ton of information in this little thing here I think even the platform coordinates possibly of where this was tweeted from in some cases sometimes that's not provided but there's a ton of information in this thing so there's one suite here and what we're going to be doing its subsequent videos is actually going through each of these and seeing if we can extract anything interesting any insights from these things and just kind of have some fun with it so back to the code so what we have at this point is just something that prints out tweets to the screen which is totally fine that it's proof of concept we've got tweets coming in but I'm just going to clean some of this up to make this a bit more robust so that in subsequent videos things are just kind of a little bit more well robust I guess is the right word so let's go and create another class which we'll call let's call this Twitter streamer so this is going to be a class that will define for ourselves which will be responsible for actually streaming the tweets just makes it a little bit more concise so we'll call this stream tweets and this is a class method so it takes the object of self I'm kind of thinking ahead here and assuming that instead of just printing tweets to the terminal like we saw there perhaps I want to save tweets to a text file or a JSON file so that way I can process them for whatever purpose I see fit later so I'm going to allow the ability for us to pass in a name which will be the file name of the tweets that will pass in so let's say fetched tweets final name I'll call that so again that's just the final name of where we want to let's say write our tweets to instead of showing them on the terminal we can do both but we don't have to and then I'll also have a hashtag list here so this will just be the list of keywords or hashtags we wish to filter the tweets out by so we're really just kind of reworking some of the code that we have that's the proof of concept into something that's a little bit more modular so let's go ahead and add a comment just to make this look a little nicer this handles Twitter authentication and the connection to the Twitter streaming API so we're going to pretty much just take this code that we wrote down here and I think we're just going to let's just copy that either remove it move that over here let's go ahead and indent that so we'll create the listener object that's all good authenticate do that so instead of hard-coding this here I guess what I'm going to do is I'm going to remove this and I'm going to put in the hashtag list that we passed into the function so a bit more modular so we can just create a Twitter streamer object instead of doing that whole business that we had before in the main part of the code another thing that I guess I want to do in this standard out listener class that we defined is well let's see let's just let me just add a few more comments here so I want to add a comment for this class this class is a class for streaming and let's say processing live tweets that looks good and then here let's do another one this is really this is a basic listener class that just prints at this moment anyway just prints received tweets to standard out so that might change over time as we make this more complicated so I guess the first thing that we'll do is we'll create a constructor for this thing because what I want to do is I want to create a standard about let's enter a class or object rather and that may be associated with a filename that these are going to be writing to so where do we want to store the tweets so I'm gonna say fetched tweets file name as a class variable so I'm going to say self dot fetch tweets file name is equal to fetched tweets file name so this odd data method is a little bit I could use a little bit of something to make it a little bit better at dealing with possible errors so I'm going to have a try except statement here so I'm going to say try let's save print data and then actually I'm not going to put anything well okay let's say let's say that we also want to write this stuff to a file too right so we've already created this file name that we've defined the constructor of this class let's say with open self dot fetched tweets file name and will append because we keep once we want to continually add the tweets as we stream them from the API stf little writes the data so if you have that right data so if you don't want to print it out and also write it you can get rid of that print statement but it's totally up to you if that what okay will return true and then we'll have it accept case here so except will say base exception and then if that exception is hit we'll say print error on on data just to make sure that we know if the method that we are in and then let's go ahead and print out let's print out the actual error message so we'll print out the string of E and then we'll return true okay so I think that's okay the error on error message I think is alright for now and then okay so now we have to change a little bit here in this main function so let's go ahead and use our new class to do pretty much what we did before but just make it a little bit more nice and modular so what we'll do is we'll create a hashtag list here well I guess create it as let's say Donald Trump Hillary Clinton Barack Obama and Bernie Sanders and let's see so we could also say let's define our fetch tweets file name as let's just call it tweets JSON so again as we saw from the output of the terminal those tweets the braced objects are they can be formatted as JSON it's a little bit easier for us to deal with that format if we want to read linen later you could put that txt there that's totally fine but I'm just changing the extension to JSON it's not necessary if you don't want to do that okay so now let's define a Twitter streamer object so I'll say Twitter streamer is equal to Twitter streamer and then we'll say Twitter streamer there it is dot stream tweets which is the method that we created up above and we'll send it takes two things the file name that we want to write to and then also the list of keywords that we're looking for so that just made that a little bit more concise and everything a little bit more modular and that will kind of get us started on this whole stream of tweets lean chin up some of the code that we wrote in part one so I'll be touching up some code and we'll be focusing on the notion of page nation and also the cursor object in Twitter and specifically the tweet P library so we'll be making use of that to do things like accessing our own tweets on our timeline accessing user tweets from a specific user that we might want to analyze tweets from also getting followers or friends from certain users and other things like that so we've been making use of the cursor class int we P to do that so the first item on the board is to clean up a few things that we did in part one just to make things a little bit more streamlined and modular before we do that I do want to import two things here at the top which will be making use of in this video one of them is called API so from sweepy import api the other is called cursor so from tweet p import cursor so once we have those we're ready to make some changes so the first thing I want to do is I want to I think change the name of this Twitter stream listener class so before I guess the initial idea of this class was just to print out the tweet but this class is doing a little bit more than that it's also writing it to a file standard out listener it's probably not the best name for this class since it can do a little bit more than just print it out to the screen let's call that something a little bit more general let's just call it Twitter listener which would mean we'll also have to change the reference to that class over here we make an object called listener so the next thing that I want to do is I thought that authenticating here in this stream tweets function is fine but I thought I would abstract this functionality into its own class so that way we can authenticate for other purposes that we may have indeed we'll have those other purposes in this video so I'm going to create another class up here which will be for authentication so I'm going to call this let's say Twitter Authenticator and we'll call the class Twitter Authenticator and what we'll do in this class is I'm just at this moment going to have a function which will do precisely what these two lines do I just want to extract extrapolate that into this class so what we're going to do is we're going to define a function called authenticate Twitter app it's class method so it'll take self as a parameter and then what I'm going to do is I'm just going to essentially lift these two things here and then move them into this function over here and then what I'm going to do is I'm going to return the auth object that I create in this in this function so the plan is in this Twitter streamer class before we had this actual these two lines creating the authentication what we're gonna do now is we're just going to create instantiate an object of the Authenticator class here in the constructor and then use that object rights where we had the previous code so let me actually just write that out because maybe that's a little bit more clear so we'll say self dot Twitter Authenticator this will be the class object of the Authenticator class and then what we'll do here is we'll just say auth is equal to self dot Twitter Authenticator dot authenticate Twitter app and that will go ahead and do the same thing that those two lines did but we've just kind of lifted that functionality into a class of its own to make it a bit more modular so I think that's one of the changes I wanted to make let's just go ahead and write this and go ahead and run it to make sure that we didn't break anything so it looks like that's streaming tweets just as it was before seems like that worked out ok the next thing I want to mention that I didn't before that I thought would be worthwhile to point out is this on error method now I did touch on this in part 1 of the video and I did say that if you encounter an error while you're streaming tweets this method will be triggered and in this case what we're doing here is we're printing the status message of the error but we might want to do a little bit more checking here namely Twitter the Tweety API imposes these things most specifically the Twitter API imposes these things called rates of limits and essentially if you're throttling the Twitter API trying to get all this information and Twitter doesn't like that it might try to stop you from abusing the system let's say so basically if Twitter thinks that you're doing that it's going to give you an error message which is going to be a 4 to 0 for 20 message and that's going to be like hey if you keep doing this that will essentially kick you you basically the first time you get this message there's a window of time where you have to wait to access the tweets again and if you keep accessing tweets and you ignore those messages that window of time increases exponentially so you can lock yourself out of accesses information so it's worthwhile to check that the status message the error code that you're receiving is is not in this form so what we want to do is we want to say if the status is equal to this error code which is in this case for 20 we want to just return false outright that's it we just want to kill the connection so that way we don't you know accidentally boot ourselves from accessing information on Twitter and I'll just put in a comment here just so if you're looking at this later and you forget what this is for returning false data method in case basically in case rates limit is occurs let's say okay so that is I think all of the changes let me just get rid of that extra space there and that is that okay great now we can start into actually making use of the cursor module that we imported there the cursor class and also essentially figure out how to use that to extract timeline tweets on your own timeline or friends timeline or things of that nature so to do that what we're gonna do is we're going to create another class and we'll call this let's call this class Twitter client so it'll be class Twitter clients and what we'll do here is we will make use well I guess let's first create a constructor object create constructor method and so it's just going to take self at the moment and what we'll do is we will create an Authenticator object so this is why we created the Authenticator class so we'll say self dot auth is equal to Twitter Authenticator authenticate Twitter app so this is just going to be the auth object so that way we can properly authenticate to communicate with the Twitter API and then what we're going to do is we're going to define another class variable which we'll call it Twitter clients and this is going to be equal to API which as you'll notice is one of the two things that we imported from tweet P in this video and it will pass it in the authentication credentials so that way that is authenticated so we're going to make use of that in the following functions that we'll write in this class and actually what I'm gonna do here is I'm just going to comment these things out and then we're going to instantiate an object with the Twitter client class and see what we can do with this so okay so let's start off with a function that will allow us to get tweets so we're gonna call this function I guess gets tweets and we're going to pass in the parameter self since this is a class method and then we're also going to pass in another argument here which is called num tweets and this will allow us to determine how many tweets we want to actually show or how many tweets we actually want to extract so we're going to first do this we're going to define a list which is called let's say we'll say tweets tweets is equal to an empty list what we're gonna do is loop through a certain number of tweets and then for each one we do we're going to store that in the list and return that list to the user so we'll say for tweet and cursor self dot Twitter client user timeline so I'll explain these things as I go dot items num tweets let's actually break this down what I just wrote here so we import a cursor this is a class that will allow us to essentially do or about to do which is get the user timeline tweets so I should actually call this let's say get user timeline tweets and yeah let's keep it like that so basically what I'm doing is I'm saying okay that client object that we created here in the in the constructor there's a there's a method for every object derived from this API class that has this user of timeline functionality and that allows you specifying a user to get the tweets off that users timeline so we haven't specified a user and in that if you don't specify user just defaults to you so if that argument is none which it is by default then it will just get your own timeline tweets and then there's a parameter for the cursor object called items and that will tell this thing how many tweets to actually get from the timeline so we can specify that as an argument sent in this function here so what we'll do in this loop we're a little bit through every tweet that this cursor object provides to us and what we can do is we can say tweet dot append yeah sweets sorry tweets dot append to tweets is the list that we're storing these things in dot append tweet and then we can just return the tweets list and that can be that will now consist of a list of all of the user timeline tweets so let's go ahead and stand you a client object and see if that works for our own timeline so going down to the bottom here what I'm going to do is I'm just going to create first like I mentioned a Twitter clients object so Twitter client is equal to Twitter clients like that and then what I'm going to do is I'm just going to print the function that we just created so Twitter client dot get user timeline tweets and that takes an argument that argument is the number of tweets to get in this case I'll say let's say five so let's make sure that works let's save that and run it so we'll see what tweets are on my timeline so it's got quite a few so remember these tweets come in these big blobs so actually to make this a little bit more readable I'm going to replace that number with just one so I just want to first clear my screen to make it a little bit easier to read and then actually go and run this again this will only get one tweet this whole thing is one tweet and basically this is if I go to let's see if I can find some information about this tweet that will distinguish it then we'll go to my Twitter profile to verify this actually got the correct tweet so I see something about sam harris org let's see is there anything else here well let's actually go to my Twitter yeah so it looks like a lot on so okay yeah let's verify that this is actually the most recent tweet on my timeline has something to do with Sam Harris so I have my Twitter page up here this is my personal Twitter page feel free to follow me if you like no obligation anyway yeah so there-there we see this is the most recent tweet that i retweeted from sam harris and it's of the late great Christopher Hitchens who if you're not a fan of shame on you he's a really interesting guy and I encourage you to read about him if you don't know who he is so this isn't a political channel but he's amazing anyway so that's that so yeah so we've successfully obtained the the most recent tweet on my timeline great now what if we want to do that for a specific user so if we want to do this for someone other than myself how do we do that well we can modify our function a little bit to allow us to specify that information so let's go ahead and do that so what we're gonna do is we're going to create another variable here which we'll call self dot Twitter user is equal to Twitter user we have a defined Twitter user yet we're going to pass that in here as an argument so basically what we'll do is we'll instantiate an object a Twitter client object and what we want to do is we want to allow the the person who is using this code to specify a user that they can get the timeline tweets from but actually what we're gonna want to do is we're going to put in a default argument of none so if you haven't seen this type of syntax before if you have you know like a function like this with an argument that has equal none or equal anything this is a default argument and if nothing is specified here for Twitter user it just defaults to none and remember in our case none if you don't specify a Twitter user it will just go to your own timeline so we want to specify a user so we will be doing that so the way that we tell the tweet be API that we've specified a user is in this cursor function here this method will say ID is equal to self dot Twitter user so that will do that so now that will actually get the user type light tweets for whatever user we specify so let's go ahead and give that a run to verify that actually worked so the way that we're going to do that remember is instantiate the Twitter client object based on who we want to get the timeline tweets from so in this case I'm going to specify based on a user PyCon what is PyCon how do I know that's PyCon let's go back here soap icon if you're not aware is a conference that folks focuses on the Python programming language and if you're not familiar with it you should check it out again also on YouTube all the talks are available there lots of really interesting talks and topics I highly suggest you check it out so anyway this is the user this is the tweets that we want to obtain the timeline or the home tweets from and the way that I got PyCon was I just followed the what what followed the @ symbol here so this is the name of the user of which we wish to extract the tweets from so let me just minimize that we've specified that it spike on will also get just one tweet from this user so let's go ahead and write that and run it and see what we get so let's see well what did we get here so we get a lot of stuff let's just get some identifying information here let's see so is there a text field so this is a text field this is generally will it actually the next video what we'll do is we'll break down what we're looking at here there's a lot of information here in this tweet it's not just the text of the tweet it's the you know ID information the user information the retweet information all those other stuff so we'll break down precisely what is going on here but there's a field in this mess which is called text and this is actually the text of the tweet so it says all early bird registrations have been claimed but it's still far from too late to register for PyCon u.s. 2018 so let's verify that this tweet is actually on the timeline so if we go here we see that this is the most recent tweet that we extracted indeed all early bird registration had been claimed but it's too far from too late to register for pi cloud u.s. 2018 cool so we're we have successfully extracted a flight tweet from another user so I'm going to mention a couple of other functions which are related but different to the user type light tweets and they all kind of have a similar form so I'm just going to I guess quickly write those down just so you can have access to them you can play with them the form of all these functions is pretty much the same so if you understand how this one works the other ones that I'm going to write which are just two other ones should be pretty similar so the other one that we can do is we can get Fred lists so this also will be a class method we can also determine how many friends we get by a parameter here that we'll set so this again just like we did before we're going to define it empty for a list which will store all of the friends for a given user in this case I'm not going to specify the user just because I guess it's not really necessary to get the concept of how this thing works so I'm going to define a loop for friend and cursor self dot Twitter client dot friends that items dumb friends so just to very briefly go over this you should see a similar type of idea on line 18 as you do on line 25 so we're looping over Twitter client dot friends in this case not use your time line and then we're getting a certain number of friends of that user based on the argument that we passed into the function and then what we're gonna want to do as we loop through is we could say friend list dot append friend and then we can just return the friend list so similar concept to what we saw above just a hammer that pattern or to give you another instance of that pattern that we saw here we can also get the let's call it get timeline let's see let's we can get home timeline let's call it that gets home timeline tweets so these are you know you have your own tweets that you've tweeted out yourself the home line or the home line oh my goodness that the home timeline tweets those are the ones that if you go to your Twitter page on the home page you will see all the people that you follow and everything like that so the top tweets there and in following you know from there are the whole timeline tweets and that's we can also extract those for a given user as well so the pattern there is similar as well so we'll say self num tweets like that we'll say let's say home timeline tweets is the list that will store them in for tweets in cursor self dot Twitter client so this should look pretty familiar and we'll say dot will say home oops client home timeline tweets are just home timeline dot items and then we'll pass in the number of tweets we wished you would get from our home timeline or any users home timeline that we happen to specify and then we'll say home timeline tweets dot append tweet and then we'll return that as well so these functions all kind of have a similar flavor to them the way that they work this get user timeline sorry about that this gate user timeline here also allows us to specify the ID of the user and actually we can very quickly do that here as well just by saying ID is equal to self dot Twitter user and we can also add that onto this one as well so I D is equal to self dot twitter twitter user so if you want more information about these API calls I will also link in the description to this API reference page which is associated with the tweet be documentation so if you want more information on some of these other things that we maybe didn't have so much time to go into so you can see there's the home timeline function there's also other parameters that you might be interested in instead of getting the number of tweets perhaps you want to get certain number of pages from a user or from your own timeline there's also some other ones so we saw user timeline you can get retweets of me statuses update statuses you can update your status via the Tweety API if you wish to do so destroy status I mean there's a ton of things here and I'm just going to kind of let your go and leave you this page and if you want to experiment with these things this is very well documented and I suggest that you do that we're going to be analyzing some tweets well we're going to really set up the infrastructure first so we can take some tweets and then analyze them using pandas and numpy so in order for us to proceed with the remaining part of this video both of those modules need to be installed so I'm just going to go ahead and open up a separate tab here and if you don't already have numpy and pandas installed then you'll need to run the following command in the terminal let me just make this a little bit bigger so if you run pip install pandas this is going to essentially give us the content that will allow us to store the tweets and the corresponding things that we extract from those tweets into something called a data frame this will allow us to pass this around maybe make a graph out of it so you'll need this in order to store the content into these data frames the other thing that we'll need is also numpy numpy is just a general-purpose numerical library for python that does a lot of things we'll only be using it for very small you know 0.001 percent of its functionality in this video but you're gonna need it nonetheless so I'm gonna say install numpy so you'll notice by the way that I tried to install pandas it said I already have it which is fine you might see the message as well if you already have pan is installed I'm gonna see a similar message here for a numpy as well it says that I already have it installed on my machine so I'm good to go so I'm gonna close that out and then I'm going to go back to the file that we've been working with and again I will say this at the end of the video but all the code will be available on my github I'm just continuing on from part two of this series and just working from that code so you can work along with me or you can download the completed code once it's been uploaded with this video as well what if we prefer so I'm gonna go ahead and import both pandas and numpy so I'm just gonna say import numpy as NP and then also import pandas as PD so this is just a general convention this will allow us to refer to anything from the numpy library by just using the dot operator so we can say num MP dot the name of a function that's provided from numpy similarly for pandas anytime we want to access a function from pandas we can just say and I should say as here as PD so we can just say PD dot name of a function that comes from the module pandas that should do it so the next thing that I want to do is I want to add a function in this Twitter client class that's going to allow us to essentially just get directly this Twitter client API so we can interface with this API and we can essentially extract data from the tweets that we get so I'm going to create another function in this Twitter client class let's just go ahead and call it get Twitter client API it's gonna take self since it's a member of this class and then really all I'm going to do is I'm just going to return the variable that we defined up here so I'm just going to say return self dot Twitter client and that's it so that's the only thing I want to add in this class let's go down to the bottom of the file so in the bottom of the file we have this this content from previous videos I'm just going to go ahead and remove all this stuff within the main portion of this file and I'm going to create another class which is going to be responsible for analyzing the tweets that we extract from Twitter so I'm going to call this class let's call it let's call it tweet analyzers so we'll say tweets analyzer and this is just going to be a class we'll go ahead and put in a comment here to briefly describe what this class is supposed to be doing let's say functionality for analyzing and categorizing content from tweets something like that okay so that's a brief description of that class so we'll eventually go it and fill this in let me go down to the main portion of the file here and let's just go ahead and create a Twitter client and then use that function that we created before to allow us to get the API so we can interface with that because that's what we're going to be using to get the tweets before we analyze them so I'm gonna say Twitter clients which is a variable I'm going to be defining that equal to the Twitter client class and then I'm just going to go ahead and call the function that we just created I'm going to say API is equal to Twitter clients dot get Twitter client API which is what we called it so now API is a variable that has the Twitter client object that we created in that class so now what I want to do is I also want to start streaming some tweets so let's go ahead and define a variable that will just call tweets and this is going to be something that we can obtain from the client that we just created there so I'm going to say API dot user underscore timeline so user timeline is a function that is provided from the Twitter client API so it's not a function that we've written this user timeline it's not a function that we've written it's a function from the Twitter client so just to be clear on that so this function will allow us to specify a number of things one thing we can specify is the user that we want to extract tweets from and another thing that we can also specify in this function is how many tweets we want to extract from that user so I guess just for the sake of example let's go ahead and say that the screen name which is how we specify the user is equal to some Twitter user in this case I'm going to be using Donald Trump as an example because I guess he's you know someone that's well known on Twitter so real Donald Trump is his Twitter handle so I'm just going to use that as the variable screen name that we're passing into this function that's again given to us by the Twitter client API and then we can also specify how many tweets we want to extract from his page by saying count which is a variable again you can consult the documentation for what the variables should be called in these functions you can't just call screen name anything you can't just call count anything these are variables that are specified from the documentation that we can hard code values to and that's what we're doing now as we call this function so count will specify how many tweets we want to grab from in this case real Donald 12 Trump let's just say 20 and then what we can do is we can let's just go ahead and just make sure this works and say print tweets so with any luck what this is gonna do is it's gonna go to the real Donald Trump stream the most recent 20 tweets and then we'll print that content out on the screen so I'm just going to clear the terminal I'm gonna say Python this is called analyzing Twitter data which is the file that we're in and I'm just going to write that so we see here that we've got so I actually had to restore that because I believe I was streaming or I was running a file from another directory with the same name so that was my fault so again just going back to clearing the terminal and then running this file which is in the directory that we're in we run this we get this content here which we can see is coming from Donald Trump's Twitter page so you can see that a few key indicators here from just the text see Donald J Trump so there's a couple of basically this is just one big tweet which has a number of features for a tweet which are specified in the JSON format and we can extract parts of that if we want to analyze whatever part that we might be interested in so in this video what we're going to be doing is just kind of honing in on a few key aspects of what content we can extract from a single tweet and then just putting that into a data frame so we've got this tweet we know that it works we're streaming tweets from Donald Trump's Twitter page and now what we can do is we can go ahead and create a data frame which is going to store that content and just going to allow us to neatly organize that and also to process it for further data analysis later so we're going to make use of this tweet analyzer class which we've created up here and also the numpy and pandas modules that we imported so the first thing that I want to do is I want to create a function in this tweet analyzer class let's call it tweets to data frame so it's just going to take the tweets that we've gotten here which is just this big JSON string and then convert it to a data frame so in this it'll take self it's a member of the class and it will also take the tweets that we want to convert to a data frame so what I'm going to do here is I'm going to create the data frame object and I'm going to say D F which is going to be the data frame is equal to PPD data frame so this is a function that's provided to us from pandas this will allow us to create a data frame based on some content that we feed it and then what we can do is we can specify the data that we want to make the data frame out of so we can say data is equal to and what we're going to want to do in this is give it a list and this list is going to be created from the tweets that we feed in to this function so I'm gonna say this is equal to the text of the tweet which we can extract by essentially saying tweet text for this tweet that we're going to do I'm just going to write out the loop and then I'm gonna explain what precisely this loop does for tweet in tweets so just to kind of unpack what's going on here we've specified this variable that we're feeding into the data frame function and we're creating a list and we're Lu through every single tweet in this tweets thing that we're feeding in here and basically what we're doing is that we want to extract the text from each of those tweets so we're creating a list where each object in that list is the text of each of the tweets so that's what this data list is corresponding to and then what we're gonna do is we're gonna specify the column for which just to specify where these are going to live in the data frame so I'm just going to give the column a name so columns is equal to let's say tweets just to specify kind of what we're storing in this column and then I'm just going to go ahead and return the data frame that we've created here so down here we have our tweets and then what I'm going to do is we're going to go ahead and say data frame is equal to well I guess before we do that we need to create a tweet analyzer object for the class we've created so let's call it tweet analyzer let's have this be equal to the tweet analyzer class so it's we're creating an object of this class and then we can go ahead and say now that we've specified that data frame is equal to tweet analyzer dot tweets to data frame and then we'll feed in the tweets that we've got from this line up here so we've gotten our tweets from using the client API and then what we're doing here is we're creating a data frame object which again is being returned from this function and then we're setting that equal to what this function is doing which is essentially taking the tweets we've gotten converting that into a data frame and it's storing that data frame in DF here so just to see what we've got we can go ahead and say prints DF head and this is just going to print out the first couple elements in this case the first 10 elements of the data frame that we've created so I'm going to write that I'm going to clear the terminal and then we're gonna say Python analyzing Twitter data if we do this we get very nicely formatted information here so each of these are the first 10 objects of this data frame that we've created and you can see the tweet text of each of these is stored under this heading tweets so that's kind of cool so we can also extract other pieces of information from each of the tweets so let's actually just I'm going to comment I'm gonna comment this out for now and what we can do is we can figure out what other attributes of these tweets we can extract data from so if I save prints dr of tweets of zero what I'm essentially doing is I'm printing out the essentially the information that we can extract from just the first tweet so this is going to show us what types of pieces of information we can extract from every tweet I'm just confining my space to just looking at the first tweet and seeing what objects we can extract from this tweet happen to be so I'm gonna go ahead and write that and run this thing so if we run this we see all of the it's essentially a list of all of the possible things that we can ask of this tweet we can ask the user we can ask the text which we saw we can ask how many retweets there are we can ask the how many likes they're our favorite count so we can ask where this tweet actually came from the ID a whole bunch of things and so what we're going to want to do is we're going to want to figure out how to extract this information from each individual tweet put that information into our data frame and then kind of build on this so not only do we have the text for each of the tweets but we've also got maybe the ID the light count things like that so that's what we want to do so I'm just going to go ahead and comment this out that's just supposed to kind of illustrate what things we can extract so now what we can do is we can also kind of be a little bit more granular I guess for another sake of example let me print out tweets of 0.9 D so the way that we access let's say the ID of tweet 0 in this list we've extracted we can just use the dot ID method here to allow us to extract what the ID of that particular tweet is so I'm going to write that run this here and we see that if we run this this is the ID that corresponds to the tweet that we've we've gotten there we can also say let's just say retweet count which is another one of the things that we can ask about this tweet we run that we can get how many let's see so maybe it's not spelled correctly here let me just make sure that I spelled out try that again so this is going to give us how many times this particular tweet tweet zero and that list was retweeted it's a total of 5,000 3 294 times so that's just some information about how we can extract this content and there's of course many different possibilities for the type of directions that you go with this information let's just go back up to our function here and then kind of build on this data frame to allow us to continually add to it so we can kind of get get all of this information in a more helpful context for us to process it so let's go ahead and say that we want to in addition to storing the text of the tweet let's say that we also want to store the ID so we can say DF ID so we're essentially creating a column in our data frame with the heading ID and we can set this equal to something so we're going to essentially feed it in a numpy array which is going to be essentially a list of the content that we're after for each of the tweets so in this case we're interested in the ID of each of the tweets so we're going to want to loop through every single one of the tweets in the tweets that were given and then extract the ID from that so I'm gonna say NP array so this is just a numpy array object that we're creating here and this is basically what the data frame we're creating a data frame column based on this array and what we're converting to an array is a list so the list that we're going to be feeding into this is similar to what we saw above so we can say tweet dot ID so that's again how we extracted the ID for a specific tweet as we saw down here below we wanted to extract the retweet count or the ID we did tweet zero or whatever ID that corresponds to in our list of tweets dot aidid retweet count whatever we happen to be interested in so we're doing the ID here so what we're doing is we want to extract that information kind of in a similar format to what we saw up here so for tweets in tweets so again what we're doing is we take the tweets that we've gotten in from this function we're looping through each and every single one of those in that list and then we're saying give me the ID of that particular tweet store it and array convert that array to an umpire array and then create a column in this data frame with a heading ID based on every single one of those right so the lot going on in that one line so now that we have that let's just go ahead and make sure that we have the data frame update appropriately so we're gonna uncomment this line down here that creates the data frame based on our function that we have part of our class and then let's just go ahead and right the thing that we had before which was Prince D F dot head and let's just see what the first elements look like so we'll write that and then we'll give this a run so if we run this we see that not only do we have our tweets for each of the columns here but we refer each of the rows here rather we also have the ID as well so the corresponding ID for this tweet is stored right here corresponding ID for this tweet here right there so on and so forth so basically we're just going to keep extracting some information that might be useful and it's going to have a very similar pattern to this line so I'm just going to go ahead and copy this line a couple times and we're gonna see what other information we can extract and again if you want to see any of the other types of pieces of information that you could extract from a given tweet you can just uncomment this to see what those attributes are so we're just going to go through this and come show some other examples so let's say I want to figure out what the length of a tweet is so I want to figure out how how long a given tweet is for one reason or another so what I can do is same format I want to loop through all the tweets in the tweets that we've been given in this function but instead of the ID I want to analyze the text and specifically I want to analyze the length of that text right so I'm going through each of the tweets and I'm saying figure out what each of the lengths of the tweets are sorry I keep on doing something here we just put this here right okay so there so I'm extracting the length of each of the tweets as we loop through them and I'm storing that in this list converting it to an umpire array and then creating a column called Len which is going to store that information so maybe another attribute that we care about is the date that the tweet was posted so same sort of concept here the way that we can extract that information again which we can figure out by uncommenting this line here and seeing what the attributes are you can also consult the documentation by the way to see if it's not clear based on the name as to what those things correspond to you can see what a more elaborate definition of each of those attributes in the documentation so if you're not sure you can get soul thought so for us to get to get the date here we can say created dot created at so this is going to give us the that particular tweet was created at same process there let's say that we also want to do this source so this is going to allow us to determine where the tweet is coming from was from an iPhone was from an Android desktop PC and the way that we can get that information is by saying dot source let's say we also want to do the number of likes so in that case what we're going to be doing is doing the favorite count favorite underscore count and then let's do one more let's say that we also want to figure out how many retweets there were for a given tweet so what we can do there is instead of favorite count as we saw down below we can just do retweet count so now we've got all of these new columns in our data frame that we've created here and they're being added to the data frame that we're returning here and then when we create the data frame object here from the tweets we're given we should be able to see a whole bunch of new columns that we've created from this function let's just go ahead and write that let's clear the terminal and then let's just make sure that we see all of that information so I'll run this here and it looks like that worked so we have the text of each of the tweets the corresponding IDs the length so the number of characters in each of the tweets the date on which the tweets were posted where it's coming from so Donald Trump likes to post from an iPhone how many likes each of the tweets got so this one got and particularly like the most likes out of any of these most recent tweets and then how many retweets each of these got as well so that's just kind of a bird's-eye view of the type of information that you can extract from each of these tweets of course the possibilities are somewhat limitless in terms of what you can take away from that in terms of what you can actually what insights you can derive from that so this is just kind of setting out the scaffolding and we'll continue to work with the scaffolding that we're setting up here and continue to tweak at it and see what we can see what we can do so we created a data frame with the ID length of a tweet the date that that tweet was created where that tweet came from like an iPhone how many likes that got retweets so on so we've kind of got this data frame now of every single tweet that we can get from some user for some amount of tweets and we can now take this data and try to visualize it in some way we're going to be doing some very simple time series graphs based on the data that we've already stored and created in this data frame so one thing I guess I first want to do just for consistency's sake is just change this to a lowercase T doesn't really make any difference you can keep it uppercase but I guess I just but the dancing so I just want to make sure that everything is consistent okay so now that we've got all of this let's go down here I'm gonna get rid of these comments and lines we've got our data frame being created I'm also going to get rid of this showing the head of the data frames since we already know what that looks like so let's just before we actually get to plotting let's just see what we can do from the data frame that we have so far to see what types of insights we can derive with without even actually graphing anything so for instance that there's a number of things you can do here there's you know countless things that you can analyze we'll just go over a couple of them and then from there we'll we'll look at how we can plot some things and again that's also relatively endless as well so this is by no means extensive this is just kind of dipping your toes into the water of what you can extract from this type of data so one thing we can get from our data frames that we've already got is let's just say that we want to maybe within the data frame we have a want to figure out what is the average length of all the tweets that we have in our data frame so we here collect 20 tweets and we want to figure out okay what is the average length of all of those 20 tweets and of course we can change this count from 20 to 200 or 400 or whatever we care about so let's just write in a comment here which is gets average length over all tweets and now that we've stored everything in this data frame it's going to make it really easy for us to just manipulate the data that's already present in that data frame and then just get this answer and the way that we can do that is by manipulating the mean function in numpy which is just going to be something that we're going to run a list in the data frame so let me show you what I mean by that I'm just going to print out the result that comes from running the mean function so I'm gonna say M PI or MP dot mean it's a little bit of a lag here since sometimes the autocomplete can be annoying then so anyway I'm calling the mean function and what am i calling the mean function on I want to figure out the mean of the length of the tweets so I'm saying DF length so basically what I'm doing there is DF length is going to return a list it's going to return a list of all of the lengths of all of the 20 tweets in this case and then we're going to be running the meed function which is provided to us from numpy on that list and then we're going to get a single number which we're printing out to the screen so that's what is going on in this line I'm just going to go ahead and write that clear the terminal and then we can go ahead and say python this is called visualizing Twitter data pie and we get here that the average length is 120 2.2 characters per per tweet in this in this size so we can of course increase that to let's say like maybe an on a 200 tweets let's see how that changes when we get a larger sample of data so it actually doesn't change terribly much and it's only gone up a little bit so that's kind of interesting so I'm gonna bring that back down to 20 another thing we can ask is let's say we want to figure out what is the tweet that received the most likes in the sample that we've gotten so we can say get the number of likes for the most liked tweet so what is what is the tweet that received the most like or specifically how many likes to the most like to tweet get so we can also do something pretty similar here I'm just going to copy this line put it here and instead of taking the mean over a list what I want to do is take the max because what we want to do is we want to figure out the maximum number of likes that is from the likes column in our data frame so the likes column stored all the number of likes for every given tweet I'm taking the max of that list using the numpy max function and then I'm printing that out to the screen so again we can write that save it run it see what we get so the maximum number of likes it looks like over those 20 tweets was something like 120 1862 so again maybe I'm just I want to see how this scales if we increase the number of tweets from 20 to 200 and it looks like it's gone up by a factor of two from one hundred twenty one thousand to three hundred and thirty two thousand so that's interesting one more so let's say that we want to do a very similar thing or instead of getting the number of likes we want to get the number of retweets so get the number of let's say retweets for the most let's say retweeted retweeted tweet and so that would be the same exact syntax the only thing that I'm changing here is the operation on which I'm performing the max function on so instead of performing unlike so I'm going to be performing it on the retweets column of the data frame and then printing that out to the screen so let's go ahead and write that and then run it so we get so the maximum number of retweets is 107 thousand so that's interesting as well so of course these are just printing out in the numbers to the screen it's very minimal in terms of what we're actually doing with the data there's endless possibilities for what you can actually do and extract from these sorts of insights that we're deriving from this data so all right so now let's move on to plotting some time series data and I'm just going to put in a comment here to distinguish what we're going to do from what we previously did time series and let's say hypothetically that we want to do is we want to create a time series plot that's going to show us the number of let's say likes that Donald Trump received on any given day over the course of some days which we can extract from you know some given count here so we extract let's say two hundred tweets maybe that's over you know raged over some number of days and then for every one of those days Donald Trump got a certain number of likes we want to plot that number of likes that he got on a given day and then just plot that for every given day over this time series of dates okay so that's the general idea of what we want to show so let's create a variable which we'll call it time likes this is going to be equal to a panda's series object so we're essentially creating a serious object so we can eventually plot this as a time series so I'm gonna say this is equal to PD series and then this takes two things the data so there's a data frame that we want to feed it which is going to be tough values so this is PD data frame of likes so we want to actually get in the number of likes there there should be in quotes likes so we're getting the values of the each of the likes so every day there's going to be a certain number of likes that are given and we're extracting the values from that and then oops and then what we also want to receive in this time series function is the index and the index is essentially the x-axis so what we're plotting and then for each what we want to do is for each day show the number of likes so the number of likes is kind of the y-axis the date is the time series itself the number of days so we're going to set the index is equal to the data frame of the date which is something that we've already extracted from from before from our data frame so this is our time series object we've created from pandas and now that's ready for us to actually just go ahead and plot that so what we can do is we could say time underscore likes dot plot and what we can do is we can we can feed this a few arguments I'm going to be feeding this plot function two arguments basically just the size of the figure which specifies how big this graph is going to be and then also let's say the color so this is going to be the color of the line that's plotted throughout the days for every given day so I'm gonna say fig size is equal to I'm just gonna say 16 comma 4 that's just the XY axis of the image that we're gonna see and then also the color which we can put as red so the function takes in a string in this case it's just a single character that corresponds to a given color and you can consult the documentation for what other parameters this this can take and what valid arguments these parameters can take as well so I'm just gonna hard coding those two and you can leave those blank if you want to it's not necessary to put them in there but sometimes you might want to tweak the graphs in some specific way so then what we do once we've kind of created our data that we're going to plot and once we've created our plot that we're going to show we actually need to show the plot so we're going to say PLT dot show now this PLT is something that we need to actually import from matplotlib which is going to allow us to show the plots that we've created so I'm going to go up to the top of my file and I'm going to make sure that I have matplotlib important so we'll just say it for import I'm going to say let's say from map plot actually know what I'll do is import sorry about that pot live dot pie plot as PLT so basically I'm importing this library specifically the matplotlib pie plot and i'm importing it as p LT so this will hopefully show up eventually sometimes the autocomplete and vim can take a little while and collide behind a little bit there we go so anyway so I'm importing the map plot library specifically from that I'm importing PI plot and then I'm going to refer to that as P LT as a shorthand similar to what we did for numpy and for pandas so I should also say that you should have matplotlib installed and if you don't have it installed you can just open up a new terminal to make this a little bit bigger and you can just run pip install the matplotlib and this should install everything that you need I believe this comes with lump I already so you might already have it installed if you haven't done pie installed if you don't if you want to be sure you can run the command that I just ran which is installing matplotlib you'll see that I already have this requirement satisfied so I'm good to go you might already see that as well if you don't then it will install on your machine and then you should be good to go so just gonna close that go back to our code here go back to the bottom of the file which is where we were previously writing code and then go from there so again just to review we've created a time series object using pandas we've created a plot with some specifications and then we're showing that plot using the matplotlib that python module so let's write this I'll clear the terminal and then I'm just gonna say Python visualizing Twitter data that's gonna pop up a window or should pop-up window it doesn't know what data is I think the reason for that is this is not this is not correct it's not data it's date right so we want the Y the X access to be the dates so that was my mistake there right then again try it again so we're gonna write this and now what we have here is sequence of dates along the x-axis the number of likes on the y-axis and we can see kind of how the number of likes changes over the course of a given set of dates so from however long ago we were able to extract 200 tweet from all the way to the present day of this recording and then there's you know maybe a few interesting things that you can see from this graph there's a big spike right here number of likes so it might be interesting to drill down as to why that is this whatever tweet this was got you know just a head and shoulders above every other tweet here so that's that might be kind of interesting to examine further so I'm just gonna close that and then go back to our code so that's the time series for the number of likes but we can also do time series for other things so it really we can just kind of modify this code very slightly to do let's say the time series for retweets so maybe I'll take this I'll paste this down here and then instead of time like so let's call this time retweets let's also rename that so that's also time retweets and then the series object that we're gonna create we want to create not with likes but with retweets and the date is fine because we still want to see how the retweets the number of retweets changes over the course of a given set of days this is fine we're just creating the same plot and then we just show it so let's just go ahead and see I'm just going to comment this initial time series out here the one that we're doing he likes to comment that out so we don't get too many plots save that and then I run it so we should get a very similar looking plot where now we can see it kind of a similar type of graph and this spike you'll notice also corresponds to the same spike probably where that huge that tweet that was liked a lot was also retweeted quite a bit too so you can see the number of retweets here on the left side of the screen the y-axis is much less than the number of likes however the number of retweets is in some way correlated to the number of likes so that's kind of interesting too so this tweet whatever Donald Trump tweeted at this time seem to have garnered a lot of likes and retweets and it seems like the the graph is somewhat consistent with the number of likes and perhaps what we can do just to kind of verify this hypothesis or to of verify but to give a little bit more evidence that this hypothesis is probably true is we can we can bunch the time series together onto one plot so instead of plotting two separate time series where we have one for the number of likes and one for the number of retweets one thing we can do is we can just put them on the same plot and see how they correlate so let's just go ahead and do that so I'm just going to comment this out and then what I'm gonna do is I'm just going to copy this so I'm creating the time series likes just like we did before I'm creating the plot and I'm going to change the plot actually instead of doing the color and the fixed size I'm actually gonna change this to well I'll leave the fixed size I will get rid of the color I'm going to add in a label because we're going to have two lines on this time series one is going to be a label for the number of likes and what is going to be a label for the number of retweets so for this one this is likes I'm going to put this line as labeled as likes and that I'm also going to put in a legend is equal to true so this basically will put in a little box in the the time series chart which will show us what line corresponds to what label and that's gonna be kind of helpful because what's nice about pandas and matplotlib is that if we plot multiple lines here it will be smart enough to distinguish them by assigning them a different color and then what we're doing here is we're essentially just labeling to each of those lines from whatever they represent and then we're going to put in a legend which is essentially going to correspond to this blue line is likes this orange line is retweets and it's gonna make it easier for us to kind of visualize what's going on so that will be a little bit more clear when you actually see the graph I think describing it without seeing it is a little bit a little bit difficult so anyway let me copy this right here and instead of actually I'll just copy this right here which is corresponding to the retweets so doing the same thing for the retweets I'm going to copy that's thin here get rid of that so all I'm doing here is I'm just making sure that the plot is formatted the same exact way as it is for the likes when it changed the label here to retweets so just to for what we've got we've got our time likes we've got our time retweets those are both the series objects for pandas that correspond to the time series data that we want to plot we create a plot for the time sorry for the likes we create a plot for the retweets and then what we're gonna do is we're gonna have one single plot that show call PLT not show call and then that is going to put the single plot up on the screen and we'll see both those lines put on there together so go ahead and write that and then let's just run this to see what we get so we get the two plots up against each other so we've got the dates again just like before and the number of either retweets or likes of the y-axis and you can see that right now we put legend is equal to true that's giving us this little box here in the upper right corner of the plot so likes is denoted by the blue line and then orange is denoted by the the retweets is denoted by the orange line so well we we can kind of see is that there is indeed some correlation between the likes of retweets so first of all there's a lot more likes than retweets but you can see certain spikes that are eyes in the likes also correspond to spikes that are eyes in their retweets as well so that's very topical probably something you could have arrived at without doing any sort of a plot analysis on it but it's kind of interesting to see the data being visualized like this this is more just kind of an exercise in what you can do using this type of data analyze the sentiment of a given tweet so we're going to determine whether or not the sentiment of a given tweet is overly positive negative or neutral and for this purpose we're going to make use of a module called text blob so this is something that we'll have to install if you don't already have it installed this particular module has a built in sentiment analyzer that's already trained on data so we can just make use of the analyzer itself to apply to our tweets to determine whether or not they're positive or negative based on this on this text blub analyzer so first and foremost let's just open up another terminal and I'll make this a little bit bigger what we'll need is this text plop of things so let's just go ahead and make sure that we have that installed pip install text blob so if you do that if you already have it installed you'll see this requirement already satisfied if you don't then you'll see it install on your machine once you've got that you should be good to go so now that we have that installed let me just close this we will go ahead and import this so let's go ahead and import this over here I'll say from text blob imports the class text blob and another thing that we're also going to make use of in this video is a regular expression and this will be made use of to clean the tweet essentially to remove any extra characters or hyperlinks or things that are not necessarily indicative of the actual text content as part of the tweet we want to remove all that because that's not necessarily going to help us in figure it out the sentiment of a given tweet so let's go ahead and just import our e which is the regular expression module in Python and we've got everything ready to go so our e should also already be installed on your machine if you have Python so you don't need to install that separately you don't need to do a pip install or anything like that so let's go to the bottom of the file and I'm going to clean up some of these things so I'm going to get rid of these plotting lines here so all of these were from the previous video I'm just going to go ahead and delete that I'm going to keep the data frame creation because we'll be making use of that specifically we'll be adding another column which will have the sentiment analysis for each tweet so what I'm going to do is I'm going to add some functions here in the tweet analyzer class the first function which I'm just gonna call it clean tweet will make use of the regular expression library to clean the tweet and remove any hyperlinks or extra characters so this is a member of the class it's gonna take self and then also the tweet to be cleaned so what I'm actually gonna do since the regular expression is a little bit cumbersome and verbose and annoying to write out I'm just going to paste it right in there and this is from a file that I've already saved off to the side and what I'm gonna do is I'm just going to of course like I always do have the code available on my github so instead of pausing the video and writing that whole terrible expression out you can just download the code and copy from there so again basically all that's really going on here this looks a bit complicated but it's just removing special characters from the string from the tweet specifically and then we're moving into hyperlinks and then returning the results of that clean tweet so we have a function that does that it's responsible for that now we want another function that's going to be responsible for calling text blob and using the sentiment and Ally analyzer provided from text blob and then returning the sentiment so let's call this function analyze sentiment this will be taking self and then also the tweet that we want to analyze the sentiment of so we'll go ahead and create an object that will be returned to us from text blob we'll call this object analysis and we'll set this equal to text blob and then what we're going to feed into this is essentially what we want to analyze the sentiment of which in this case is the cleaned tweet so we're going to say self dot clean tweet and then we're going to feed in that tweet that we get into this function make sure that it's clean passing the clean tweet into this text blob thing this class and then this will allow us to leverage the sentiment analysis tools the text blob provides to us so now what we're gonna do is we're going to do just that so we're gonna say if analysis dot sentiment dot polarity so what we're doing here is analysis is the object created from text blob there's a function of that called sentiment which will make use of the sentiment analysis engine and then there's a further function that's called polarity which is a property of that analysis which basically tells us whether or not the the tweet in this case is positive or negative so the polarity is a metric of whether or not that tweet is positive or negative in nature so if this property this is greater than zero we're gonna return one so this is to indicate that the polarity is positive so it's a it's a positively interpreted tweet so we're gonna return one in that case so else if the sentiment analysis about sentiment about polarity if this is equal to zero then we essentially don't know whether or not it's positive or negative so it's just going to be neutral so if it's just a neutrally analyzed tweet we're just going to return zero to denote that so zero will be the case when the tweet is just a neutral tweet and then otherwise so otherwise the case would be that the polarity is negative and in that case the sentiment analysis engine determined that this tweet is actually negative so what we're going to do to denote that is return minus one so that'd be the way that we make use of this function to let the user know that the tweet was analyzed to be negative so we've done that we've gone ahead and created the clean tweet and analyzed sentiment functions so now we're gonna go ahead and make use of them let's go down to the main part of the this file here so what we're gonna do just going to save this what we're gonna do is we're going to build on the data frame that we've created from the tweet the what is it called the tweets analyzer class so we're going to build on that data frame so we're going to add another column which is going to be the sentiment analysis for each of the tweets that we have in this data frame so what we're going to do is we're going to create I'm actually just going to copy one of these lines up here and then add it after this one so really what we're doing is we've created this data frame which again is returned to us from this function as part of the tweet analyzer class and then what I'm doing is I'm adding another column on to that data frame which I'm going to call let's call it sentiment and this is going to have the 1 0 or minus 1 depending on the sentiment analysis of that tweet so what we're gonna do we're gonna change one thing in here instead of saying the reach week we obviously don't want that what we want in this case let's see I think I delete it too much there so what we want in this case is we want to actually call our function so we want to say tweets analyzer dot analyze sentiment and then we're going to pass in the tweet so we're gonna go ahead and pass in that tweet and then we're gonna be looping through each of the tweets in the tweets list that we have here so basically actually I it doesn't know what tweets is so we need to specify what we're actually looping through so instead of tweets which is clear from the function up here so tweets is defined here but it's not defined down here in the main so we want to specify that we're looping through the data frame that corresponds to the entry of the column that has the column tweets because again that is where we're storing each of these each of these each of the texts that corresponds to each of the tweets so just to kind of unpack what's going on here again we're looping through each tweet in the data frame column corresponding to the heading tweets which is again created up here and returned in this line here and then what we're doing is we're doing a very similar thing which will look familiar if you saw like video 3 where we did these things to kind of determine and create new columns corresponding to the IDE length date source things like that we're doing the same thing only now the value that we're storing it at that column for that given tweet is the sentiment analysis of that tweet and the way that we're doing that was we're making use of that function that we created which is analyze sentiment which returns 0 1 or minus 1 we're feeding in that tweet it's going to get cleaned it's going to get analyzed and then we're going to have one of those 3 numbers so we have our new column here which is the sentiment analysis so let's just go ahead and make sure that this works as expected so what I'm going to do here is I'm just going to print out the first 10 entries in the data frame so I'm gonna say print D F dot head and then I'm just going to pass in 10 which is just letting Python know that we only want to see the first 10 entries in this data frame so I'm going to write that I'm going to clear the terminal and then I'm gonna say Python and the name of this file is sentiment analysis Twitter data type pie so if we do that it'll get the tweets and then we see we have our data frame here we've got our familiar columns and then we also have the sentiment so we have minus 1 for this first tweet it looks like that definitely could be phrases controversial - one it looks like there's not enough of that tweet is showing there for me to determine whether or not it's really controversial that third one is neutral congratulations argue maybe that's positive the next congratulations tweet here that's positive so that makes sense thank you thank you is it's repperton to be positive next one is neutral but vodka and the remaining tweets are positive so from this very cursory glance Donald Trump seems we can very positive guy so yeah so there's that so anyway that's pretty much it for this video if you have any questions or comments or anything like that then don't hesitate to leave them in the comments section below as I mentioned before all the code for this will be available on the github and I'll have a link to that in the description you can just download that there so thanks again for watching and have a great day
Info
Channel: freeCodeCamp.org
Views: 161,487
Rating: 4.9522305 out of 5
Keywords: twitterpythtwitter api python, streaming tweets python, stream tweets python, stream live tweets python, twitter api tutorial, twitter stream python, twitter and python api, tweepy tutorial, tweepy, stream tweets tweepy, python tweepy tutorial, analyzing twitter data in python, analyzing tweet data, analyzing tweet data in pythong, sentiment analysis twitter, python sentiment analysis, python twitter sentiment analysis, python tutorial, python tutorial for beginners
Id: 1gQ6uG5Ujiw
Channel Id: undefined
Length: 90min 1sec (5401 seconds)
Published: Thu Oct 18 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.