Does Twitter Believe in Dogecoin? - Sentiment Analysis in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] what is going on guys welcome back in today's video we're going to use python to perform a sentiment analysis on dogecoin so let us get right into it all right so the first thing you want to do is you want to go to developer.twitter.com to the developer portal and then you want to create a new app here so first of all of course you create an account and then you go to projects and apps and you create a new project i'm not going to go through the whole process first of all it's very simple second of all i have a video on that already just twitter sentiment analysis and python but once you have an app or a project you can get the keys and the tokens for this project so you get the consumer keys api key and secret and you get the authentication token so this is what you're going to need for today's video and before we get into code i want to mention why do we actually need sentiment analysis now what you can do with sentiment analysis and i want to say here that this video is not investment and by advice it's not cryptocurrency advice in any way i'm just uh showing you one option that you can use to or or one thing that you can add to your investment strategy if you want to this is not investment advice but what you can do with sentiment analysis is you can base your trading or your investment decisions on the opinion of uh of a certain thing on twitter so you can say okay i want to know if the tweets about dogecoin for example or bitcoin or any other topic are more positive or more negative and then you can base your decisions based on the trends you can see using sentiment analysis so you can use machine learning to see if the individual tweets are more positive uh or negative and then you can say okay if the if the feeling about a certain topic is more negative i'm going to sell and if it's more positive i'm going to buy and so on again no investment advice here but this is what you can do with sentiment analysis all right so once we have all that we're going to open up a command line and we're going to install a bunch of different libraries we're going to need for this project today and the first one is called two pie or tweepy i think we pie which is a combination of twitter and python and we just say pip install to wii pi like that this is what we're going to use to access the twitter api in my case i already have it then we're also going to install text blob so pip install text blob which is what we're going to use to do the sentiment analysis as you can see it's based on nltk which is a natural language toolkit so um that is what we're going to use to do the sentiment analysis then you should also have pandas numpy and matplotlib so just pip install pandas pip install numpy and pip install matplot lip so i'm not going to do that here those are very basic uh packages and then we're going to import all of them so we're going to say import tweep i import text blob text blob import pandas spd import numpy snp and import matplotlib dot pi plot splt and then we're also going to import regular expressions and i'm going to talk about this in a second or maybe in a couple of minutes but we're going to use regular expressions to change some things about the tweets because all the tweets usually have some mentions and hashtags and we want to remove all those so that we have actual text that we can feed into the sentiment analysis so those are the libraries we're going to need then we need to set up the things that we need to connect the actual twitter api and for this we're going to say uh in my case here i have all the twitter keys so the api key the consumer key and so on the access token uh in this twitter keys file here because i don't want to display them while doing the tutorial you can also if you're not worried about them you can also put them as clear text in your code you can also read them from files in my case i have to read them from files so i'm going to say all keys is going to be open i'm going to open the file that i called twitter keys in writing mode and then i'm going to split it so i'm going to read the text from it and then i'm going to split the lines so that i have the individual lines the first line is one code the second line is another key the third line is another key in the fourth line is another key so in this case how i structured it is that the api key is going to be all keys 0 the api secret key api key secret is going to be all keys 1 and then we also have the access token the access token is going to be all keys too and the access token secret is going to be all keys three there you go um then what we also need is an authenticator so we need to authenticate to the twitter api so we're going to say authenticator is going to be twi.of handler and we're going to pass the api key here and the api key secret and then we need to say authenticator dot set access token and of course you were going to pass the access token and the access token secret so this is how we connect the api now we also need to create the api itself so we're going to say api equals by the way let me just make sure i'm not blocking anything here i'm going to move this make it a little bit smaller um so we're going to say api equals twi dot api with capital letters and we're going to pass the authenticator and we're going to say weight on rate limit true so that we just wait instead of crashing um i think that's what this is for so besides that we now have the api and what we want to do is we want to know uh what the sentiment for a specific topic is and this topic in this case is going to be a cryptocurrency now the title is uh dogecoin sentiment analysis this is just because i think it's a good title for a video it's going to uh it's going to be more interesting we don't have to limit ourselves to this particular cryptocurrency we can also change this to bitcoin or ethereum or you can even change this to something like tesla and say you want to know about the sentiment analysis uh for this topic so in this case i'm going to say cryptocurrency but you can actually call this variable here topic but i'm going to say cryptocurrency because this video is focused on cryptocurrency and we're going to call it dogecoin here there you go so we're going to do a a sentiment analysis about this topic or on this topic and then what we need to do is we need to create a cursor and a search term now in the beginning we're just going to do a basic sentiment analysis without too many limitations but later on we're also going to compare uh the individual time frames so we're going to see okay is dogecoin today more positive than it was yesterday for example um and for this we're going to first of all we're not going to talk too much about the start and end date yet we're going to do that in a second but for now we're just going to say search equals and we're going to do an f string here which is going to be we're going to to look for hashtag and this cryptocurrency in this case it's dogecoin so cryptocurrency here um and then we're going to add a tag which is minus filter retweets this means that we're just not going to take care of retweets we're just going to look at the ordinary tweets and once we have that we need to create a cursor so we're going to say tweet cursor tweet cursor equals twee pi dot cursor with a capital c and then we're going to say api dot search this is the function um the search term is going to be the search year that we defined so q equals search um the language is going to be english so we're going to pass en of course you can also you know look for topics in your language so you can also pass de for german for example um but we're going to use english here and then we can also specify two parameters that we're not going to take uh care of uh yet so we're not going to look at since and we're not going to look at until because that is a time frame we're going to skip that for now and add that later on and we're going to say tweet mode equals extended extended and then the important thing is that we specify how many items we want to have because we don't want to know all the tweets that there are we want to know a certain number of tweets so we want to see a certain number of tweets so we're going to say dot items and the limit is going to be for example 100 we're going to base our sentiment analysis on 100 different tweets so in order to get the tweets finally then we just need to say tweets equals and then here we're going to say a list comprehension which is tweet dot full text so we're going to get the text so tweet dot full text which is not a function for tweet in uh tweet cursor so we're getting each tweet that we have here in the tweet cursor as a result of that we're getting the text of each of that and we save that into a list and now we have the list of all tweets or not all tweets but of 100 tweets uh for this topic with uh or without the retweets all right so now we have a list of all the tweets and we're going to turn this list into a data frame then to this data frame we're going to add a second column which is going to have the polarity which is what we're going to uh use text blob for and then based on that polarity we're going to decide if the tweet is either positive or negative we're not going to allow for any neutral results here because we want to polarize um and then we're going to say tweets df which is tweets data frame is pd data frame tweets and the columns just so we have a name columns is going to be tweets there you go and now we can go ahead and say or actually before we add anything to that data frame we're going to clean the data that is already in there so we're going to um to remove all the mentions we're going to remove all the links we're going to remove all the hashtags we're going to remove line breaks so there's a lot of useless stuff in the tweets that we don't want to take into account when we do the sentiment analysis so for example if i tag someone in a tweet that doesn't give me a lot of information about uh the sentiment of the text so we're going to say 4 the index is not going to be important so we're just going to use this placeholder here for row in tweetsdataframe dot iteros so we're going to iterate over the individual rows and then we're going to apply a regular expression to find certain patterns and then remove or replace them with something else so we're going to say for the tweets or for the tweet in this row the result is going to be regular expression.sub and we're going to substitute everything that has a pattern http followed by something basically or actually without the colon because we want to have https also so https and then plus so this basically means everything that starts with http is going to be replaced by nothing so we're going to remove it and as an input we're going to tag to take the the text of this particular row so we're going to take the tweet we're going to remove the links and we're going to save the result into this same row once we have done that we're going to do the same thing with another pattern so we're going to say our regular expression dot sub and we're going to remove everything that has a hashtag so we're going to say backslash and then s plus i'm going to remove this and we're going to save the result into row tweets then we're going to pick another pattern again so this one is going to be every mention so if i tag someone we're going to remove that so at backslash s plus this is going to be removed as well and then finally we also have backslash n um but backslash n is a bit tricky because backstage n itself is an escape character so if i want to use this i need to say backslash backslash n because i actually want to pass the string backslash n to the regular expression i don't want to pass a new line to the regular expression um so this is how we do that now we have the clean tweets you could say and what we do then is we map a certain function that is going to be an anonymous function onto that data frame so we're going to actually perform an analysis a sentiment analysis for all of the individual tweets and save their polarity score so how do we do that we're going to say tweets underscore df which is going to be um a new column here it's going to be polarity so tweets the f polarity is going to be tweets the f and um here we're just going to take all the tweets like that and we're going to map onto them a function so we're going to say all the tweets map and here we're going to define the lambda expression so we're going to say for the tweet that we get to this function we're going to return text blob dot text blob of this tweet dot sentiment sentiment dot polarity now i hope i didn't make any mistakes here text blob text blob tweet sentiment polarity yeah so what we do here is we for each row for each row here we map this function uh onto the row which means that in this new column polarity we store all the results for this function being applied on the strings in this row so we're going to say textblob.tweet sentiment polarity we're going to get that value uh we're going to return it for each tweet in tweets and this is going to be the result that we store in the polarity column now what we can do as well is we can say tweets df result or you can call the sentiment or i don't know what you want to call it so this is just going to be tweets df and it's going to say from the it's going to be based on the polarity and here we're going to map another lambda expression um but not on a tweet we're going to say lambda poll for polarity and we're going to say if the polarity is larger than zero we're going to say it's positive otherwise it's negative so we're going to say plus if polarity is larger than zero else minus you can also use a word here i just use the symbols and this is how we calculate uh if or for all the individual tweets if they're positive or negative all right so now we can go ahead and visualize this simple result here we're going to say positive it's going to be tweets data frame and we're going to count all the tweets where the result is positive so we're going to say tweets the f where tweets the f dot result is plus and we're going to count the tweets like that and for negative we do the same thing so we say negative and here we replace with minus and then we just do a simple bar plot we're going to say plt.bar zero one here for the x values and then we're going to pass positive negative for the y values we're going to have the label positive negative and then color is going to be green for positive and red for negative like that now we can also use a legend i don't know if it's necessary and then we say plt.show so if we didn't make any mistakes this should work so let's see what happens if we run this it's going to take some time to get to tweets but it's either going to crash yeah it's crashing why it's twitter keys let me just see what i did here oh of course because i'm opening an opening in writing mode this is of course not right so we need to open in reading mode but besides that this should work hopefully no we have index out of range why do we have that twitter keys which is that we're opening the file in reading mode dot read dot split lines so why exactly is this out of range in line 10 i'm going to come back to you once i figured out the solution all right so the solution should be quite trivial the file was empty there were no keys in the file now i edit them so this should be the main problem besides that the script should be working [Music] and this is also not a problem that you should have because you can just enter the keys themselves here as you can see we have positive negative okay the legend didn't work out quite well so maybe we're going to just not use it but as you can see we have around like 45 positive tweets and like 55 negative tweets so this gives us some information about the state right now but we don't really have a time frame now the twitter api goes back to about seven days if you're not using some premium api you can look back seven days but it doesn't really give us a good picture about the evolution of this sentiment over time because let's say a topic is kind of negative but the negativity goes down and the positivity rises and it's not as negative as it was a couple of days ago that is still an upward trend that you should notice so what we can do is we can actually limit uh or not limit but we can focus on individual days uh in order to get better results so what do we do here we go ahead and say cryptocurrency below here we just say start equals and then we can define a starting date now it's important that you take care of the fact that you have a seven day limit um so you're not going to have um you're not going to be able to look at the results for uh two years ago or something like that so we're going to say 2021. today is the 6th of april so we're going to look at uh 0404 here and the end is going to be oh four or five oh four oh five there you go so we're going to look at this time period here and in order to do that we're going to add here until is going to be end and since is going to be start and yes we can pass that as strings and this should give us some different results here so let's see what we get here in this case now i'm also not sure if we're going to get the same results all the time or if it's kind of random so here you can see definitely that we have like 60 negative tweets and like 30 38 or something positive positive tweets but let's see what happens if we just run this again with the same settings because maybe it's kind of random depending on the tweets you get um but this should be fixable by just looking at a larger number but as you can see this is the same here in this case we can also just take a couple more tweets because the larger the size the more accurate the prediction or the analysis because remember the law of large numbers uh is important here so let's see what happens if we use 500 items and then we're probably going to see i mean i think we should see a similar result but maybe not let's see if we're going to get any result at all because maybe we have crossed the rate limit already seems like this is the case i'm not sure no there you go still pretty negative so you can see in this time period uh we have quite negative sentiment for dogecoin now if we change this to another day we might find some different results so let's say we don't choose this time period but this time period here maybe this is going to be different so let's run this here and maybe we're going to get some different results maybe it's worse maybe it's better but what you can do here all in all is that you can just look at the past seven days and spot a trend so you can automate this with a loop for example where you just increase the date uh and then you're just going to see okay over the last seven days the sentiment went from negative to positive or it went from positive to negative or it stayed the same and based on that you could made in you could make investment decisions even though i don't recommend this but let's just look at one more let's go for this time period here and if this doesn't help we're just going to leave it so you can spot trends you can also use different search terms you can use also correlation so you can get the sentiment for five different values uh for five different topics and then you can compare them and plot them in a heat map to see okay if the sentiment for bitcoin is positive then the sentiment for dogecoin is also positive or something like that so this is something that you can do now this looks a little bit better it's still negative but it's a little bit less negative you could say at least i think so but we can change the topic now to bitcoin for example and we're going to get maybe a similar result because they're kind of related but you will notice that if you enter some search term like happy for example you're usually going to get more positive sentiment whereas if you enter something like uh i don't know death or anything that's inherently negative we're going to see a pretty negative plot there but for bitcoin you can see that we have a slightly positive sentiment so yeah bitcoin is probably a little bit more positive than dogecoin here so that's it for today's video hope you enjoyed it i hope you learned something if so let me know by hitting the like button leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you very much for watching see you next video and bye [Music] you
Info
Channel: NeuralNine
Views: 5,728
Rating: 4.9864407 out of 5
Keywords: python, finance, investing, chart, charts, data science, prices, price, stock, crypto, bitcoin, ethereum, crypto currency, cryptocurrency, cryptocurrencies, visualize, candlestick, ripple, litecoin, correlation, heatmap, blockchain, blockchains, cryptography, dogecoin, memecoin, meme, doge, sentiment, analysis, sentiment analysis, predict
Id: dSOUd9Sm1gI
Channel Id: undefined
Length: 23min 31sec (1411 seconds)
Published: Wed Apr 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.