Building an AI Twitter Bot That Learns From Trending Tweets [Tutorial]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Only skimmed over it but definitely looks like something I'd want to watch thoroughly sometime. Thanks for sharing!

👍︎︎ 2 👤︎︎ u/Reading102 📅︎︎ Jul 12 2020 🗫︎ replies

thank you..

👍︎︎ 2 👤︎︎ u/ricyoung 📅︎︎ Jul 12 2020 🗫︎ replies
Captions
hello and welcome my name is tyler i'm going to try something new today i haven't done coding videos before but that's what i've been doing uh today what we're going to be looking at is we are going to be using tweepy which is a python twitter api and we're going to be using gpt2 which is a text generation model that's pre-trained we're going to be using transfer learning and we're going to write an ai bot that learns off of twitter subjects and then spits out tweets on trending topics let's get into it alright so here's the finished product running live on twitter it's tweeting about mueller it's tweeting about donald trump it's tweeting about uh relevant things obviously so this bot itself is actually running uh up on aws um i've got it in a tmux window so that it doesn't close my if my session breaks off and we're going to go through everything that's going on here and get our own up and running here disconnect all right so i've got an empty project here let's go ahead and create some new folders here src for all my code i don't want that there i want it here and data cool in my src folder so the first thing we want to do is we want to go to twitter and we want to get uh the relevant the current trending topics off of twitter so we're gonna need our twitter credentials okay so if you have not already uh go ahead and go sign up for the twitter api in order to do that you go to developers.developer.twitter.com and i'm make sure you're signed into the correct account and then you're going to go through the process to create an app once you've created an app these are going to be the pieces of information that you're after and make sure that when you're prompted that you tell it that you would like to be able to tweet from this app okay so now we're going to go ahead and drop into this twitter credentials file we're going to create an access token and access token secret a consumer key and a consumer secret so i'm gonna go ahead and copy these all okay now we have our twitter credentials inputted we're going to close that out go back to our source directory and make a new file and this is going to be our twitter caps off twitter ai bot so we're going to write everything uh just in straight python and we're just going to run it in the terminal is going to be our plan that's how we deployed it here on on aws um so we're just gonna operate with that as our norm so i've gone ahead and i've already done pip install tweepy uh if you haven't done that already you'll need to do that so from import and from tweepy so yeah let's look at that too uh yeah here it is so the tweepy documentation docs.tweepy.org goes through everything you need in order to be able to log in but i've already done that so let's take a look here we're going to need our os handler we're going to need our cursor and we're going to need the api and then we're going to go ahead and say import twitter credentials as tc cool so now we're going to go ahead and set up our our twitter credentials and make sure that that works that we are successfully talking to uh the twitter api so we need to set up this auth object first and so we're gonna do that with the oauth handler we're gonna pass it the twitter credentials dot consumer key and twitter credentials dot consumer secret that we just entered and then we're going to go ahead and say auth and call the function set access token and then this is where we're going to pass in those other set of keys so tc.access token and tc dot access token secret all right and then we want to go ahead and set up our api and that is going to be with our api object and we are just going to pass that off and then i am actually going to uh i'm going to pass in some extra parameters here that are going to come in later namely that's going to be weight on rate limit i'm going to set that to true and then i'm going to say weight on limit notify i do want i want it to spit out into the terminal if if we're currently waiting on our our rate limit we want to know that okay so we're gonna we're gonna define a function here that is just going to go ahead and get trending tweets um and then we're gonna go ahead and call that and make sure that we are connected um look at my notes okay so we're gonna call this git trending we're not going to pass any parameters into it just say the trends equals api dot trends by place there's a couple different versions of this we want to go by place and i've already sussed out which id to pass it which is going to be this code right here this should be for all of the us these are weird they're weird codes it's not like location it's funky trust me though that works okay trending so we're gonna i'm gonna do an accumulator here we're going to loop through these we're going to loop through the return the dictionary that's returned and we're going to add everything to that dictionary and then we're going to return that dictionary so for trend in trends uh as i was pretending trends if i printed this object you would see that it is a dictionary and um yeah this this is where to go in the dictionary but we're just gonna skip walking through that you'll just have to take my word for it so we're going to want to say trending dot append and then append the trend and then return okay so that should be all we need there we're gonna go ahead and do a uh if name equals main so this will run for sure and we'll just call get trending and then let's actually print the results of that okay cool okay change directory into our source and go ahead and just run it python3 twitter ai cool this is good news it means it is printing it is giving us the results we're not getting any failures or anything we are successfully getting oh we're getting the dictionary okay okay so we actually want to do is we want to append trend name okay let's see if that worked there you go so there's all our names goya notre dame big 10 home depot tick tock blah blah blah i know tick tock's down right now so that's why everybody's paying attention to that okay so the next step is going to be to import our gpt2 this is a very powerful text training model there's a lot of documentation and information on it they just released gbt3 and so uh gbc2 actually only works with up to tensorflow 1.13.1 i believe and so we're going to have to deal with that and i'm gonna get a virtual environment configured as well uh to help us do that to help us run it without any errors but let's let's just go ahead and see if it imports import gpg two which also that this is all in the documentation as well uh you do need to do a pip install if you haven't done that yet but we're gonna have to do that in the virtual environment as well anyway so we'll walk through that when we get there so let's go ahead and let's see if this runs okay eventually the gpt2 is going to give us an error but we're just going to keep going and when we run into that error we will handle it so right out of the documentation here we're going to go ahead we need to download our model so we're just going to copy this right out of the documentation and we're going to run it right in line after all of our other imports and setup statements and we're going to go ahead and use the 355 model which is the it's the medium size model so right here i don't have os installed so let's go ahead and get take care of that import os okay all right and this is going to take a second to download but let's just go ahead and make sure it is going to download okay so our model went ahead and downloaded i'm actually let's just get rid of this okay so now we need to take the trending topics our main function that's going to run is going to pick just one out of this list that it's generated and then we need to go to twitter and search for all of the as many tweets as we want to grab on just that subject so we need to define a new function here which is going to be get topic tweets so we're going to be getting tweets by a topic and so here's i'm going to pass in the topic that we need and then we're going to pass in the max tweets that we want it to get we'll set that to a default of a hundred which is not that many the higher we set that we'll see that it starts to hit the uh rate limit and so that's where these are gonna come in when we start passing in then we want a thousand or two thousand or 5000 or 10 000 tweets then it's going to run into that rate limit and it's automatically going to go back around the loop until it gets the answer until it gets all the tweets we've requested so what we need to do next inside our function we need to we're going to use our cursor here in order to do our search so we're going to catch all this on a variable and then we're going to use our cursor when we pass that api dot search and then our query so for the query we're going to pass the topic and i've gone one step above that if you go to the twitter documentation there are quite a few options on how to filter your tweets uh if you wanted things like positive sentiment negative sentiment minus certain words and in this case we're gonna do getting rid of retweets we only want original tweets we want original text from people so we're gonna do minus filter so that should get rid of the retweets for us okay so then just for redundancy we're going to make sure that the language is consistent that it's all english and then we are going to say that tweet mode is extended this way it won't truncate any long tweets it will uh it'll it'll give it to us as full text we'll have the full text on all of them so oh not index sorry items and then this is where we pass in the max tweets okay now i know that i'm going to put this inside of a list just because that's how i did it i think i got it out of the documentation that way so i'm just going to be consistent with working off of my notes here so that should then pass a list into that searched tweets object and we are then going to do the same thing where we are going to accumulate them and we are going to use a try catch statement in order to validate that we actually have a full text because every now and then we will get a tweet that does not have the full text on it for some reason rather than debugging tweepy i'd rather just write code that works so let's see for tweet in search tweets so we're gonna go ahead and try to found tweets.append tweet dot full text so that'll drill in to get the full text object off of that which is what we asked for when we said extended and if it doesn't exist we're just gonna pass we don't care okay and then once we're done return found tweets all right and now we're going to be ready to start writing the main function that we're going to be looping through to generate a single trending tweet so that we can have our bot tweet just one generated tweet so we're just going to call that def trending tweet we don't need to pass any arguments uh because it's just gonna do this for us we're gonna call this over and over again uh so let's go ahead and first of all let's write some comments in here to remind us uh what we're doing and what order we're doing it in so let's say first thing that we're going to do is pick a topic and then we are going to fetch tweets on topic and then we are going to train a model on new tweets and then we are going to generate a text with the new model and then we are going to filter and return one valid tweet from generated text okay so we have a lot to get done here in this function let's get started first thing is going to be picking a topic so we're just going to go ahead and call that get trending function and we're going to return it return that list to a variable so trending equals get trending and that's going to give us a list of trending topics and then we're going to say topic equals choice of from trending so that's just going to pick one out of that list and i don't know if i said it before but go ahead and import random choi import choice from random in order to be able to do that and then we're going to say our our our topic let's see here this we're going to print because we want to know what topic we picked uh jenner then generating i can't spell right now generating tweets on topic and topic okay so the next thing we need to do is we need to uh we need to to fetch those using our other function and then we need to write them to a file that we can feed into our model for it to train on so the first thing we're going to do is we're going to define a file name of where we want to save those tweets so we're going to call it file name and we're just going to call the path to the file so from where we are we're going to go up we'll go over to data we're going to have our folder topic tweets actually we don't want to do that we're just going to do it right there in uh right there in our tweet in our data folder so then we're going to add our topic and we're going to add to that the dot txt extension so that is going to be where we're going to store the file that that gathers all of our tweets so the topical tweets that our function returns will be caught here we're just going to call our get topic tweets we're gonna pass at our topic and for learning purposes we're just gonna put 10 in here because we don't want to get caught up on that rate limit normally i would put something more like 2000 for the running bot but just for training purposes we're going to have that there so and then we're going to say that the string that we're going to save to file we'll call that tweet string and what we need to do here is we're going to get back you remember from that function we're going to get back a list of tweets and so there we want to join them into a single string that we can write to our text file but we also want the ai to be able to learn how to break the tweets up so we're going to add some filler text here which is going to be this so later on we'll split on that on that text and that will separate our tweets in code so we'll join the list here from our topical tweets and that will create the string that we need to write to file so let's go ahead and write that to file so with open file name which is why we define that file name and here we're going to say we want to write so this way it will always overwrite the file if their file already exists it'll overwrite it if it doesn't exist it will create it so as file and then we're going to say f.right and we're just going to pass this the tweet string so now we should have our file created we can test that but i know it works so i'm not going to bother i'm just going to keep charging forward in what we're doing here so the next thing we need to do is we're going to start playing with our gpd2 we're going to train a new model excuse me and first thing we need to do is we need to establish a tensorflow session so this is right out of the documentation which i don't remember there it is so first thing we need to do is we need to establish a session then we can do our fine tune and then we can do our generate so we're going to go ahead and say session scss equals gpt2 dot start tf session cool so the next thing is if we already have a checkpoint for this topic we want to continue to train the same model on that checkpoint if we do not then we want to start a fresh model for that checkpoint so if not os.path.exists and in here we're going to say checkpoint because this is where it's going to save it right here in this checkpoint folder checkpoint slash topic so this is saying if the topic doesn't exist if we have not trained a model on this start a new model if you have then go ahead and uh train over update update the existing model basically so gpt2 this is where we're going to start doing our fine tuning fine tune so we have a bunch of stuff to pass in here first of all you have to pass it the session to pass it your data set and that is going to be the file that we just created and then we're going to pass it the model name which is the model name that we downloaded above you remember the 355 model that we created that is the model we wanted to use so we're going to define steps and i'm going to just set this to 2. for now on the deployed one i would use a larger number but just for training purposes just to show you guys so that it'll it'll run quickly we're going to set that to 2 restore from and we're going to tell it to restore from fresh for this one and then we're going to say run name and this is important because this it no just topic this is important because this is how it saves the checkpoint so now it's going to save the checkpoint by that topic name which means that this will be what it will save so that'll make sure that our check will work and we will find an existing one last thing to enter in here i want to say print every and i want to see everyone because i want to know that it's running cool next thing to do is to copy this down and the only change we're making in the second one because it's going to be loading from the old one is we're going to tell it to restore from latest and this will allow it to load the existing model and train it further and you know i would do fewer because we've already trained it i i would do fewer steps on this on the live one it does 100 i think 200 to start and then 100 to retrain the existing model so that should go ahead and train up our model so the next step is to generate tweets so the way that we're going to do this is we're going to use a gpt2 function called generate to file and we of course with gpt2 we have to pass it the session that we're using and we are going to give it a length this tells it how many characters we want to write 400 characters and we have to give it a destination path so this is going to be a file that we continually overwrite that we read out of in order to pick our tweets out so we're going to write this down into our data data data and we're going to say generated tweets dot txt so what that's going to do is every time it generates a new batch of tweets for us to select from they will always be in that file and samples we'll just say five because we don't want it to run for too long and then run name is the topic so that's going to let it know to use the checkpoint that we just created and one last bit of trickiness prefix and we're also going to give that topic so what that's going to do that's going to seat it with the topic that we just uh the text of the topic to see that that will be in at least the first tweet for sure we want it in as many of the generated tweets as possible because we want to use that topic in the tweet and so this is this is a way to help ensure that it's using that as seed text it's the last thing to do here and because we're going to be looping around on this more than one time is we need to do a function in gpt2 to reset our session we've used it to train new models and generate our tweets but now we want to reset everything so that when we come back around the whole thing doesn't break on us so call reset session and pass the session in and that should reset everything so that we won't get any errors when it comes back the next time around tries to start a new one and uh goes and trains again okay so at this point in our function we should have a block of text sitting in that generated tweets file and we are gonna read that in and we're gonna do some filtering on it and then we're going to randomly pick one tweet out of it and return that tweet from this function and that's gonna be the tweet that we tweet so we're gonna do our filtering right here so first things first with open we need to open that file i guess i could have saved that to a variable and just passed it but this is fine generated and just like we wrote right before we just want to read that's r as file and we're going to say that the the texts are the file dot read that'll read everything out and this is a little bit tricky uh because i don't have any of the text to show you i haven't generated any text yet but all of the text because we did multiple n samples it actually breaks up the samples and the string that it breaks the samples up by is that many equal signs so this will give us an array of the each set of 400 character samples that we generated with our generate to file and it will cut this text out so that we won't have to deal with those equal signs so now we're going to loop through that list of texts and we're going to pick out valid tweets and we're going to add them on to an array so let's create an accumulator here and say for text in text do our first loop there's going to be multiple loopies in here and what we want to do first is you remember we broke everything up on this string so we want to do that same thing for each set and that will give us a list of the tweets in that particular run in that sample so we're going to do that by uh just so i don't overwrite any other variables we're going to call that tweeters because it's nice and fun and that's going to be this this block of text dot split and we split on that character string that we created cool that will now give us a list of our tweets in tweeters and so we'll say for tweet and tweeters so we're going to cycle through that list of tweets for this set of samples and the first thing we want to validate is is that topic is the topic in the tweet so if topic is in tweet then we can continue otherwise just continue otherwise go to the next tweet we don't don't do that there you go otherwise uh yeah go to the next tweet we don't want to tweet something out that doesn't contain that subject information that we went so so laboriously to go ahead and grab off of twitter so the next thing we want to do is uh we're actually going to go ahead and remove uh the http we want to remove links from the tweet first and then we are going to validate that it is not just the subject itself that we we got we don't want if we got a hashtag we don't want to just tweet out that hashtag back out as just that and if we have links in there those links aren't going to be valid because the text generator is going to put random things it's going to put http colon slash slash and then random stuff okay so the first thing that we are going to do is we are going to take our tweet and we are going to say that that is going to be our tweet split on spaces so this will break the tweet up into a list of words that we can then deal with in order to be able to filter out our our yeah we want to filter out the the um filter out the links is what we're doing sorry so we're going to create a little helper function here to help us out called filter links we're going to pass in each word into this and we're going to say if word dot find http so if it has a link in it if it does not have a link and it returned true otherwise return true otherwise return false so we're going to use that as our little helper function here to go through the words in our tweet and make sure that if they have a link then we're just going to go ahead and get rid of that word we don't want that word so the way we're going to write that is going to keep rewriting over this tweet variable we're going to join the string back together and inside the join we're going to write a uh we're going to write a little inline inline loop here so for word for word in uh tweet so we're going through that list uh and then if not uh filter links word so that will uh if the word is not a if the word is not does not have a link in it it'll go ahead and add that and then join it all together into one string with spaces so we're busting it apart pulling the link out and then putting it all back together okay last thing to do is to make sure that if the length of the tweet is greater than the length of the topic and we'll give it a little bit of padding here to just make sure the length of the topic plus we'll say four so that's going to make sure that that it's it's not just like the topic in a couple of spaces or something like that that it actually has some other words in there hopefully even if it's something silly like go and then the topic or no and then the topic it hopefully has a couple more characters in that so that's why we picked four so if it does we've now validated our tweet and removed all of our links and we can say tweets.append tweet cool so i've done all our filtering and added them to our list we are ready to choose a tweet we can reestablish that variable and we can say choice again and we just want to pick one from our tweets so that'll give us the one tweet that it's chosen out of our validated list of tweets and the last thing to do is to make sure that the length isn't over the 280 character limit so we're going to say if length of the tweet that we just chose is greater than 280 then we're just going to slice it down we'll just say tweet equals tweet and then slice it down to 280 characters we don't care if it's a little incoherent we don't care i could improve on that i could improve on a lot of this but that does indeed work and that will give us what we want return tweet cool no errors everything looks good alrighty so now we are ready to do one last little bit of silliness here uh which is to write a very very small function that we are going to call from our name main and that is going to be a a perpetual loop that will do our repetitive calling of the function that we just built so we're going to call that def run bot we don't need to pass anything into it's the only thing we're going to put in here is while true which will always evaluate to true the tweet that we want to tweet is going to equal generate trending tweet and that will return the one tweet we're gonna go ahead and print to the console because we want to see what we're tweeting i am tweeting and then the last thing to do is to go ahead and send this to the api send this to the tweepy api the way we do that is by calling that api object that we established up top up top and saying there it is update status and pass it to tweet okay call run bot so that that will run on start let's go ahead and run it and see if we get our result so we trained for our two cycles i had to do some some resets there because apparently 100 tweets was not enough so i had to up it i upped it to a thousand and that worked uh if we look in our checkpoint folder we have our robert mueller uh folder that that's that's what it shows on here's all our raw tweets separated by that that tweet string that we specified and then here's our block of generated tweets some of which are repetitive nonsense and here they are broken up by that character i told you they'd be broken up by and then this is what it chose that it thought it was going to tweet out and let's go ahead and take a look on twitter and see if it tweeted it there it is there's our tweet back up on twitter all right so that's pretty much it for this project there's a lot of improvements that i could make i could do better on the validation of the text here i could have it only tweet on a certain set of subjects instead of fetching them off of twitter there's a lot of different directions i could go with this but i thought it was an interesting project i thought some other people might get a kick out of the notion of being able to use gpd2 to generate tweets on trending subjects without having to read a single tweet or do anything do it all in code hope you guys enjoyed if you have any questions drop them in the comments
Info
Channel: Tyler Morris
Views: 1,371
Rating: 5 out of 5
Keywords:
Id: tSaVryuzFTM
Channel Id: undefined
Length: 36min 28sec (2188 seconds)
Published: Sat Jul 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.