How-to Use The Reddit API in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi and welcome to this video on how to use the reddit api in python so i'm going to keep this really short and we'll just get straight into the code in just a moment but i just want to describe what we're actually going to cover in this video so the first thing we need to do is obviously get access to the api so i'll just take you through how we can do that and then i'll explain how we authenticate ourselves when accessing the api after that i'll take you through some of the most common uses of the api that i think most you're probably going to be most interested in so that's stuff like getting the most popular threads from a subreddit or just a steady stream of all the threads being posted onto a subreddit so let's just get straight into it and we'll start putting together our api okay so the first thing we need to do is head over to this page here which is reddit.com press slash apps now we just want to scroll down here and find this create another app or create an app button and you click on there and then you just give it a name it doesn't really matter what you call it just something that you recognize we are using this as a script for personal use obviously if you are using this api something else then tick one of the other options that is relevant you can give it a quick description and then here you need to give it a redirect uri so for me i'm just gonna enter my twitter address because basically you can put anything you want in here but it's so that when people are wanting to find out something about your api they will be directed to whatever you put in this box so obviously if someone's find out about my api they'll come to here and they know that they can ask me about it okay and then here this is our secret key which we are going to need later so make sure you keep note this and also this personal use script as well so i'm just going to copy those across and put them into my jupiter lab here and i'm just going to call it client id so identify and this is the public key and here we have our secret key as well so this one you need to keep secret obviously i'm showing you this but this api won't exist by the time i upload the video and we just enter those so now we have those the next step is to request a temporary auth token from reddit and the first thing we need to do is actually import the request library then we get our authorization lights out and here we enter our client id and secret key now once you've done that we are going to need to actually log in so to do that we can first initialize a dictionary where we specify that we are going to be logging in with a password which we do like this and then we pass in our username and password as well and for my password i'm just going to read it in from this text file here you can if you're on and this is just a simple script you can just enter your password here it's not recommended it's recommended that you read it from elsewhere but it's completely up to you how you deal with this but this is how you can read in from a text file and just make sure you put r there instead of w for read okay so that is the dictionary that we will need to pass along to reddit in just a moment so we also need to essentially identify the version of our api and for this you can literally put anything you want but we'll put something that is at least slightly descriptive we'll just call it my api and put this is the version number now all we need to do is actually send a request for our oauth token we send this request to this address we are accessing the api version 1 and the access token endpoint and in there we also need to include our auth that we received earlier we need to include our login data and we also need to include the headers and this will return us hopefully everything that we need okay and then here we can see our access token so we need to access that and we just store it in a variable here so this token is something that we will need to add to our headers whenever we're using the api so to do that we just write this and we need to add that within authorization and the token itself needs to be formatted in a string that contains the word bearer space and then the token itself so then if we just print out headers this is what we get so now we can access every endpoint within the reddit api so beforehand if we had tried to access this endpoint the oauth reddit.com then api v1 me if we'd have tried to access this we would have not been allowed so let's say we just put the headers and we will just put this user agent api that we had before okay and we get a 401 response so let's copy this and try again but this time use headers which includes our authorization vera token obviously you get a 200 which means everything is okay and then we can add json on to the end here and we get all of this information so that's great we now have access to anything and we can start accessing what i think is probably the more relevant important information so the first one those i want to focus on is retrieving the most popular posts on a subreddit so if i head over to the reddit api documentation over here okay so we can see here we have this get subreddit hot and this returns all of the hot posts on that subreddit so in our case let's go with the hot threads in the python subreddit so to do that we send the get request and like you can see here it's this r subreddit hot so we can copy that across and we start the request with the oauth reddit.com and then we have our our subreddit get rid of this n bracket hot and of course the subreddit that we want to look at is python and then we can just add our headers in here so this is request not ready and then we can see what is in there using this json method and then here we get all this layers so this is obviously not very clean at the moment so let's clean this up and we can put it into a panda's data frame so it's a bit more readable so first let's figure out how to access each post within the response so let's open this again now within this json all of our posts are contained within this data key here so let's add data and then once we get into data we have a few different options so we have this mod hash which is you know nothing we need to care about we have this which just 27 that's not the post that we want and then we have this one here which is children and then you'll see that this is a list and within this list we have all the information about all of the hot posts within the python subreddit so that is where we want to extract data from so let's do that let's print that post okay and now we are getting somewhere and you can see there's quite a lot of data in each one of these so it's probably worth let's clean this up a little bit more so you can see here this is our other um the next entry in this list so what we probably want to do here is extract the data within the post so this is giving us this other dictionary which contains all the relevant information we want and then it is within here that we are going to want to extract different parts of information into our data frame so just as an example we have the title okay and then here we can see all of these titles of the numerous popular threads in the subreddit so this is essentially the syntax that we're going to use to populate our data frame so first let's just import pandas and maybe install it okay and then we need to initialize our patented frame so we do it like so okay and that just gives us an empty data frame and then we're going to use the for loop like we did before to loop through each one of the posts and just extract them as a row into this data frame so we'll do df equals append and then within this we create a dictionary which is going to contain everything that we would like to include and at the end of that as well we also need to remember to ignore index otherwise we'll end up with a load of errors and we want to avoid doing that so first let's include the subreddit just so we know where this data is actually coming from so just like before we want to do the post data and then we just access the subreddit okay and let's just have a look at what we have there so okay perfect as expected we're getting all of these entries through that's great but obviously we're probably going to want a little bit more than just the subreddit so let's just add a few more items as well so we have the title like we did before and another pretty important one in my opinion so let's just go this another important one is the self text which contains the actual content of the thread or the text content of that thread so that one is pretty important if you're wanting to extract any information about well anything from reddit okay so this is starting to look a little bit better let's see what we have okay and it looks good and maybe we want to also include a few other items maybe the number of upvotes the down votes and the score of the posts so we can do a few different things here we have the upvote ratio [Music] which is of course the number of votes it is getting in in comparison to downvotes and maybe we'd also just like to include the actual number of upvotes and down votes as well and again it's pretty straightforward we just include these and we can include downs like so and finally we can also include the score of the post okay so that gives us quite a lot of information that we can go ahead with this now if there are other things that you're interested in adding in here you can just do this to actually see what what keys you can include so let's access the data and then keys and this will just return lists of everything in there now this is pretty useful for actually finding the most relevant or the most popular posts but a lot of the time what you might want to do is actually stream the newest post so you essentially get a real-time update of what is actually going on and i would say this is probably what most people are going to want to use the api for so we can take a quick look at that as well and we can find it just over here we have this r subreddit new okay so essentially all we actually need to do here is adjust our old call to instead of reaching out to the hotend point we reach out to the new endpoint so let's just modify our code to do that okay so up here where we have hot we just change that to new okay seems to work and then we just do the same thing again so we just rerun this code okay great and then we do this and we get all of the latest posts on our subreddit which of course is pretty useful now this is returning around 27 to this one is 25 posts at once and of course you're probably going to want maybe a few more than that so what we can do is actually add a limit parameter and this limit parameter we just add like so add params and then in here we add limit and we can go up to 100 items so if we run that and let's take a look at what we had before we had this json and we had this this equals 25 which means that we returned 25 items before now if we run that we will see 100 so now we're returning 100 items and of course that's pretty useful so now we're getting more data back and we can essentially just keep running this again and again and extracting as much data as we would like so if we just rerun this so you can see we go up to 24 here rerun that and we will go over to 99 okay so that again is pretty useful now there's also one more thing that is pretty important to understand with this and that is how we can extract the ids of a post from the reddit api so if we go into post here we have these two different items we have kind which is actually i think up here so we have this t3 so reddit posts just have these different uh types or kinds and it's essentially a code that says whether it's a thread or some other type of post which i think is something like ads or videos or something along those lines but generally where i was always going to be working with t3 which are threads but if you are working on something else of course that may change and then as well as that we also have the id which is here and we can put both of these together in order to create the reddit post id so we add this with a underscore in the middle and this that is the unique id and that is unique for every post on the subreddit and in the api documentation you will see this referred to as the full name so what we can do with this is actually essentially loop back in time with the api so one of the things we can do is only request threads that are further back in time than a post given a specific full name which would be this t3 mix of letters so if we would like to do that so let's take this final one we have here and all we do is add that into another variable after like this and this will only take 100 new threads that have appeared after this post so we can do that and then what we can do rather than actually initializing our new data frame we can avoid doing that and we can actually loop through and add all of these new posts to our data frame and then we end up with even more data and here we go okay so that's how we can walk through and keep extracting more and more data from the reddit api now at some point it will stop allowing you to do this you can only go so far back in time which depends on the volume of requests that you're making the volume of threads on a specific subreddit but that is essentially all you need to actually uh do that so like i said it start the red api is incredibly powerful and unlike most other apis on social networks it's free to use so definitely something to take advantage of and see how you can implement it in in your own projects so i hope you've enjoyed the video and thank you for watching see you next time bye
Info
Channel: James Briggs
Views: 10,380
Rating: 5 out of 5
Keywords: reddit, python, api, data science, data analytics, tutorial, learn to code, programming, reddit api, pandas, requests, pandas dataframes, dataframe
Id: FdjVoOf9HN4
Channel Id: undefined
Length: 23min 20sec (1400 seconds)
Published: Fri Feb 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.