Python coding - How to access the Reddit API tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys so today's video is going to be all about using the reddit api so if you're not familiar with reddit it is reddit.com a popular website where people talk about things and today we're going to go through the learn python subreddit we're going to use python to determine who the moderators are and then go off and collect a whole bunch of their activities so we can work out when do they actually go to sleep or something along those lines at the very least you're going to learn how to request information from an api how to package that information up into functions and how to call those functions and generate data frames in pandas and we're even going to output that to excel stick around to the very end because all of the code is going to be there for you to use extend and share let's get into it okay let's get down to business now in front of me i have the jupiter notebook now if you haven't got a jupiter notebook available make sure you download anaconda that's my preferred method or head over to pythonanywhere.com it's a really good cloud-based python environment now what we're doing today is obviously we are going to navigate to reddit uh and the final outcome is to determine when a moderator is most active commenting and posting so how can we do that all automatically in python first of all we need to obviously navigate to reddit now i'm here on the learn python subreddit which is this one here now i want to figure out okay who are the moderators so to do that i need to scroll down and have a look at moderators view all moderators now there's a really cool trick that you could do in reddit which i didn't realize was a thing until recently you can actually just add to the end of the url dot json and what gets returned is uh just a json file so as you can see here uh this looks very similar to how we operate with uh dictionaries and we can actually use python to request this directly now i've done some testing before hitting the record button and i do know that reddit is pretty fussy with its headers so you do have to define headers and you do need those to be able to request this page easiest and quickest way to request headers is to go and f12 uh hit all refresh that page down here you'll see that we've got the actual request which is what you see here uh if you're familiar with my previous videos you know i just hit copy as curl then i head over to my favorite little website which is this curl converter link in the description paste in your curl command and then what you get in return is this very nice python command so what we have here is a request uh it is using a get and it's going to reddit.com learn python about moderators.json so shift enter on that okay that's run we can see that's run because we can shift enter on the response to 200 and then if we were to just ask for the text of that response what we actually get back is a text object and we know it's a text object because we can say hey what's your type and if we look at the type here it is a string now that's using the attribute text on the response object but if we say hey give us give us some json okay shift enter on that all of a sudden it does look already starts to look a bit different and if we ask for its type we now have a dictionary so all that means is with that dictionary and what we can probably do is be a bit smarter about this we've got response and then we'll use something called maybe json underscore response okay so json underscore response is equal to response.json alrighty and now we have a list of moderators so just um let's go through this data together we have a kind user list data and then that has a series of let's have a look it has a key called children and then it appears to have a list and that list has a name who is the moderator author flare mod permission you know at the moment i'm really just after just a list of moderators so let's go ahead and just pull out and what we're going to end up with is a nice list of moderators so how do we do that first of all let's ask the json response for its keys okay so dot keys tells us there's kind and there's data now if we look at obtaining data it's as simple as going square brackets showing me the data and then what data returns is a single key called children so let's ask for children okay what we're left with is a list and we know it's a list because of the square brackets and that list contains a series of dictionaries and each of those dictionaries has a series of keys so one way we can get the name of the moderator is we could quite easily do a simple for loop so for you know item in and we've got a list here and we can say let's go ahead and print oops print the item and we're printing the item name okay and that gives us a nice list of our moderators now that's kind of the long way we're going to append that to another list it's going to get a bit confusing probably one of the easier ways to do this is to use something called list comprehension and the first time you use it is a bit confusing but it is as easy as square bracket item for items so it's very similar to what we just wrote okay the best way to read this is four item in and here is the the four part and in this part here is what we saw before which was inside the loop so if i run that nothing actually changes we're just asking for the items within it but what we can do is we can very easily similar to a second ago just put the word name in there alrighty and what we got now is we've got all the moderators in the list good to go now just notice here there's one called auto moderator we don't want that so what we could also say is we could say if and we can say if the item name alrighty if the item name is not equal to the auto moderator so that one there shift enter on that and beautiful so we now have a list of moderators so what we can do here is we could say mods are equal to and now we have a variable that holds that list which will return a list of moderators so the cool thing about that is we have very quickly using that dot json part of this we've very quickly been able to obtain that list of moderators what we might choose to do from here is we'll go ahead and we will bring all this together and we might package that up into a nice little function okay so to package that up into a nice little function all we need to do is bring all that together now you notice here there's some cookies set we probably don't even want them so we'll get rid of those okay let's have a look so we've got the headers the response the json response and pretty much the mods now these headers i'm not a huge fan of any sort of global variables and if we put this into class we can obviously make it an attribute but for now we're just going to leave these headers outside of the function because we can keep using those inside our functions what we're going to say is define get mods okay and this is the function to get the mods now what you'll probably pick up on is right now we are only looking at learn python so we probably want to make this a little bit more clever in the sense that we want to be able to pass in the subreddit so let's go ahead and do that now and one way we can do that is we can say subreddit which is now a local variable to our function okay and just what we'll do is we'll put a little f string in here so we'll start that off with an f and then rather than learn python we'll just put in some curly braces and say hey whatever i pass in as the subreddit at the top here i'm going to return those users now as i'm saying that i've just realized i probably don't want to return a list of mods i probably want to return a list of dictionaries which have two keys one key tells me the subreddit and the second key tells me the mod name and the reason i want to do that is i might choose to then iterate through 10 subreddits and each of those be returning that nice little list of dictionaries okay so let's go ahead and modify this query just a little bit to do that all we need to do is have a look at how we are returning item name now what we can do is we could actually modify that and what we might do is we might just pull this out for now so we're not constantly re-running this query we'll pull that out for now and clear that one away so right now this is bringing back a nice list of the names but what we actually want to bring back is maybe a dictionary each time so we'll start off by putting that into a dictionary and we'll give that a key so the key there is mod name or mod just mod mod okay shift enter on that and now what we're bringing back is we are bringing back a list of dictionaries and the cool thing about a list of dictionaries inside that list of dictionaries and if this is getting a bit confusing and it can be the first time you do it so that's okay and we're going to want to know the subreddit as well so what we'll do is we'll say the subreddit okay we'll make sure we're inside the dictionary there the sub reddit okay that is equal to and now right now we just put the word test x we'll replace that with an actual subreddit in a moment shift enter on that and what we have now is we have subreddit as test and this is the the mod now the reason i put the word test there is we do want to pass in that subreddit variable that is local to that function so what we'll do is we'll copy and paste that and just replace that one there so what we have now is we have now have a function that returns the mods so let's make sure we put the word return in there so return mods okay so define get mods asks for a subreddit has a response object which is a requests get which looks at this url and passes in a subreddit name asking for the moderators the json return object and then we have this very pythonic way to use list comprehension okay to cut down on all those tricky for loops we uh basically grab the name but we also define the subreddit so make sure that's working let's shift enter on that and let's test it out so one way we can test that out is we can say get mods and we can pass in a subreddit get mods for and let's have a think about who we want to get mods for let's let's do the same one so learn python okay there we are there's all the mods for learn python now if we wanted to get a bit more tricky we can say learn programming so pro gramming shift enter on that and there is all the mods for learn programming okay great so now now we have now we have a function a standalone function that we can ask it a question and it's going to say to us hey here is all the mods and the subreddit okay now that we've got this standalone function which is really cool let's go ahead and scroll down and navigate to one of those mods and figure out how do we get some of their activity so we'll go with the first one here okay we'll go with blah blah and let's just see if the same trick works again so if i just put a dot jason at the end dot jason alrighty what do we get okay beautiful looks like it spits out a number of things um we've got a kind of t1 all right and it looks like the word kind appears quite a bit in here yeah kind t1 t1 t1 they're all t1s what i've selected which is fine i'm just looking for activities i'm not too fast if it's a comment or a post or whatever but to make sure i'm getting the right things let's just check with the reddit documentation so reddit api t1 i'll just google that quickly okay so let's have a look so t1 beautiful so t1 is a comment um you've got an account a link a message a subreddit an award okay perfect so we're just looking at comments at this stage and that's fine let's have a look at when people are leaving comments alrighty nice and easy so what we've got now is we now have a way to navigate and get a user's activity so bring that into python it's a very similar exercise to a moment ago so what we'll do is we will take that url alrighty and we're pretty much going to sam is before we're going to go ahead and create a response object now it doesn't matter that's got the same name as the previous one this one this response is now local to that function cool so now that we have the user what we can do is we can take that url and run a very similar exercise to before but where we have the user id we can similar to the subreddit we can pass in maybe um just pass in the word mod okay now for testing what we will do is we will just use a temporary variable called mod that way it's all kind of set okay make sure we put the single quotes around that fun fact about python you can use single or double doesn't actually matter so now that we have the mod as a temporary variable what we can do is we can run that same sort of process as before now shift enter on this one and what do we get back we check our response and we've got 200 which is a okay and similar to before we can say show us the json now a little bit different to how it's structured but from what i could tell once we get down to children then we've got the 25 responses now let's have a look and how we might navigate through these so similar to last time we can ask for the keys okay so again we're looking at data now the keys on data let's have a look i think it was children already so if we look at children and i'm noticing there's something here called after which could be a way for us to navigate and collect more information beyond what this returns so children and then children appears to have and similar to before series of dictionaries so if i go to what is the length okay as expected we're getting 25 results back which is good to see and what we can do is we can say okay what what information here is important to me so what i want to know is i don't really care about the type of activity or i really care about is the author which we kind of have because we've asked for that particular person's you know information and then we've got the created utc so this stage is it's ultimately just the created utc i'm not here to pick out any of the details of any of their comments and what they're getting up to obviously you can if you choose to but for this exercise i'm not doing that so how do i get the created utc so let's have a think about that similar to before what we could do is we could do a bit of list comprehension so item for item in and you'll get used to that and again this word item is literally just a local variable for this little loop within the list comprehension i could quite easily say um blah for blah it doesn't honestly matter as long as they're the same because you're saying hey for bla in this do something with the blah and it's iterating so it does it 25 times and each time you want it to do something now inside here we've got something called created utc so what we might do is we'll see if we can pluck that out and we'll leave this as blah as an example uh go ahead and ask for the creator utc now that hasn't worked what have i done wrong let's find out so what do we got so we got kind ok here's the problem kind and then it goes to data now once we're in data then we can get the created utc so a bit of an oversight on my part but i'm glad you saw that uh because it just proves i'm not perfect cool so now we've got approved utc and obviously we want created utc which is that one that one there alrighty so created utc is available it's coming back as what appears to be a float okay so it's got a decimal place there now we are probably going to want this as an integer at some point we may as well make an integer now so we can get rid of that uh floating point arithmetic so let's go ahead and just wrap that around an integer keyword that just makes sure it's integer and what you'll see happen here is it will actually get rid of anything after the decimal place so we're going to do something similar to what we did before so we are going to if you recall we've got subreddit and then the mod and then this time i'm kind of wanting to get the mod and then the date of the post so what i might do is i might again create a dictionary object okay uh and what we've got is we've got um so we've got some activities so activity utc okay shift enter on that and what you're going to notice straight away is i've got an error it's weird let's figure out why okay i can see why my bracket's in the wrong place shift enter on that and straight away by putting the curly braces around that and that colon there that's now made that into a dictionary or key value pair but obviously i want a couple of things in there also i want to include the mods name so similar to before we've got mod and we've got the mod variable name now they are different you can see here one has single quotes around it so it's just a string the other one does not so it's referring to the variable name so shift enter on that and all of a sudden we've got the mod and the activity sort of utc so we're getting places we are we are getting that activity which is which is quite good now i'm not sure if 25 is going to be enough for us to sample to see certain interesting things or patterns in this data so what i might do is i might see if i can crank that up a little bit one way i can do that is we'll go back to our response jsons let me go ahead and just give ourselves some space okay now that we've got some space let's have a look at that response jason because i do recall seeing something in the area of after let's see where i saw that so if i go keys again i've got some data okay and in data i have something called after now i've done some pre-reading i'm not going to tell you fibsy i've done some pre-reading and i think i can do something like this after okay after has a value now if we pass after into our query our response query so this one here similar to how we do json we can do question mark after equals we can go ahead and we can ask for the next 25 comments and then the next 25 comments let's have a think about how we might be able to iterate through this user's comments by using this after uh part of the puzzle so the best way to do this is i'm i'm thinking okay we have we have the moderator that's all good that kind of stays the same what we kind of want to do is say for i'm going to use an underscore purely because i'm not going to use the variable in this for loop and you do see this quite a bit especially if you've used things like opencv computer vision package it uses underscores quite a bit in in how it functions but the underscore just says i don't even want to say blah or whatever i'm just going to say underscore so for underscore in range okay range four so all that's going to do let's give ourselves some space underscore you know range four uh and it's just pretty much going to run a loop for us four times i you know i am running four times okay there you go you've run four times now that's really quick and easy way for me to say well if i'm getting 25 posts and i want 100 i'd run it four times now if i want from 500 i've run it 20 times okay so what that looks like now is we have the ability to run a loop 20 times but one thing we do have to do as part of that loop is we have to make sure that we are updating the url to include the new after which sounds a bit silly now what i might do is i might actually break out this little url here alrighty and just give it a name url is equal to this little guy here and what we'll do is we'll pass in that variable now what that means is we now have a url that we can then sort of modify each loop and then have you know that after sort of part in there as well so to do that what we'll need to do is we'll first need to outside of the loop okay define after as a variable so after is equal to none okay so none is a no value nothing's in there it's just basically empty okay uh and if i look at after being none so what i could do here is i could say well after is equal to none and then i could do something as simple as you know if after is equal to none okay let's go ahead and call the url this version here which is just the straight url you get the very first time uh else so if it's not none uh let's modify that url just a little bit to put a question mark at the end okay and pass in after alrighty that will only be triggered if after is populated with something now we do need to populate it with this part of the response okay uh which looks like this so then what we have is in our for loop the very first time it runs there will be no nothing in after and then at the very end after will be equal to you guessed it this little part of the query here which is response.json data and then after so if i shift enter on that it will run the 20 times but we kind of want to do something with the results so we have some smarts here that does that for us okay you know we've got this json response we're collecting all that information so what we might do is we might go ahead and let's have a think about this it's going to it's going to go in and it's going to generate pretty much this this object here which is this list filled with dictionaries so outside of the loop we might want to actually create maybe um an empty list and then each time this runs we'll just extend that list and by extend i'll show what i mean but i'm just basically adding to the list so to do that we'll call this um let's have a look activity oops activity list there's an empty list okay uh and then when this runs okay what we might say is we'll go mod activity it's equal to this and then we'll say activitylist dot extend so just add everything from the mod activity list into the extend uh and sorry into the activity list alrighty so let's give this a try and to make sure it's running what we might do is we might say print i am running okay now shift enter on that i'm running i'm running i'm running this is fantastic we'll speed this bit up [Music] okay great that actually didn't take very long at all now we have this activity list which has been extended each time with this mod activity so what we can do now is we can say let's just clear some of this information here we'll start off by asking its length how long are you fantastic you are 500. now if i open that up let's give it a run what we now have is we now have uh i've been calling this utc time i haven't been very clear on what that actually is so this is actually the number of seconds since 1970 sometimes referred to as epoch time utc time linux time unix time um yeah seconds from 1970. now it's not exact um every few years for a little while there i think it's been 20 or something i can't remember the exact amount there's been a few sort of leap seconds so i believe this might be out by a tiny amount but ultimately it is a very good solid format of time that you can inform that into a more readable way so what we have now is we now have 500 examples of webweb's activity and what we can do with this information is we can determine some very interesting things so let's go ahead and take it to the next step now that we've got this time function here we probably want to make that a bit more useful so one way we can do that is we can go ahead and say and we'll just grab this one for now so let's just say this is an example of time so say time example right so we're going to do that as a temporary variable now uh first thing we're going to do is we're going to make sure we import the time module so i'm going to go ahead and import time alrighty so now that we've got the time module that will enable us to do some very interesting things with the time so you've got things like time dot gmt time and you can pass in this time here okay shift enter on that what that's going to do for us is that is going to build out a time object okay time structured object which is is pretty much going to tell us everything we need to know about this time so you know the year is 2020 the month is the seventh month it's the 24th days the sixth hour so on and so forth now what can we actually do with this so now that we have this structure we can actually go ahead and run a string from time so what that looks like is and let me just go ahead and paste this in here and explain it to you so what we have available to us now is string from time a format in which you want to or how you want to cast that time and then the time we brought in a second ago so what that ends up looking like is shift and show on that and you end up with with this format you end up with the date now the neat thing about this i might ask for the date but i might also ask for things like maybe the hour so you can see here it's brought back 0 6 okay so that was the hour now right now for this analysis i'm only going to grab the date and the hour for now so what i might do is give this a name so i've called this activity utc and what i might call this one is something like activity underscore date and then i might give this one a name as activity underscore hour okay so now that we've got that what we can do is we can actually bring that back into our loop and we can go ahead and build some smarts around that so let's have a think about how we would go about doing that now we've got this really massive list comprehension that's getting a bit out of hand i'm not sold on the idea of keeping it as a list comprehension after we've done all of that so what we might do is we might modify that back to a for loop just for ease of reading so this is a very interesting point um there'll be instances where you go down the path of doing something very pythonic and then as it gets it grows and gets a bit bit bigger you might say well actually you know what if i came back to that in maybe six weeks time would i fully understand what this is doing in a couple of seconds or would i have to go through and read that and try to really understand so so let's go ahead and modify that list comprehension and bring it back to a for loop which feels a bit backwards but like i said uh it is probably the better way to go about it so for four blah in response.json data children okay what we are going to do is we are going to create i'm going to create a temporary uh i'll just create it as a dictionary it's pretty easy so the dictionary is this one here okay if we recall so we'll just bring that back into this loop uh we'll put it on multiple lines so it's a little bit easier to read so we are looking at mod and we've got mod coming in from elsewhere and we've got activity utc which is that one there now activity utc what we might do just to make our lives a tiny bit easier we might just create a small little variable in here we might call it um utc time alrighty which is that one there so it makes it a bit easier to write utc time rather than keeping on repeating that one down here we've got the two examples we use now we'll convert these into keys in our dictionary so let's go ahead and do that now so we'll go let's have a look here so we've got activity utc we've got activity date so that becomes a key as well and rather than equals two it is the colon there and what else do we have we have activity hour alrighty so activity hour bring that one in as well okay great so just looking at this one we've got mod activity utc and we'll just make sure we've got the names of these correct uh and we've got the commas where they need to be so every one after gets a comma now the final step here is we do have this dictionary we should probably give it a name something like the data dictionary okay hold some data data dict is equal to and this is the dictionary here and in the final step is we have this data dictionary alrighty and then each time we sort of run this we do want to make a a mod activity so shift enter on that and what's going to happen is it's going to say i'm running i'm running i'm running i'm running and then similar to before we will be able to ask for the activity list but this time around we should have a couple more details in there we will have the date and we will have the hour as well so let's go ahead and run that and see what that looks like beautiful so now we have the mod the activity utc the activity date and the hour we are getting close to some really good analytical insight so now we need to package all this up so similar to before what we're going to do is we are going to create a function which will hold the ability to pass in a mod and return the activity okay the 500 pieces of activity where are we at so what do we got so we've now got the ability to go ahead and input a subreddit and get a list of mods and then input a mod and get a list of their activity so this is feeling like we are pretty close everything's coming together really nicely so what we might do is we might clean up this code just a little bit so we've got our input requests import time we've got our headers and we've got the get mods we've got the mod activity we've got pretty much everything we we kind of need down here i guess we can sort of start writing the program that's going to go ahead go to the sub roulette reddit get the list of mods get all of the mods activity and then let's let's do something with that information so to write our little program what we'll do is we'll start off by defining a variable called mods and i think mods are just going to be equal to get mods and get mods from subreddit say learn python okay so we type that in there shift enter now in our little program we have a list of mods okay so we are only doing learn python for now but this does have the extensibility to go through tens hundreds of of subreddits if you really wanted to now from the mods a couple of things i should point out one we've defaulted this to 500 if they don't have 500 sort of comments it may fail we might need to do some try and accept stuff in there we don't we may not have time in this session but if if that is of interest to you definitely leave something in the comments i'm happy to keep extending this we can do it as a little group of community i think we can build something really cool so uh now that we've got the mods what we want to do is for each of those mods we want to collect up all of the activity okay so i'm going to create a very silly variable name called all of the activity okay all of the activity all the activity okay is equal to let's have a look at this one activity is an empty list it's an empty list most things start off as empty and then we're just going to pack full that activity filled with all four of these mods so how do we do that so what we're going to say is for mod in okay and four mod in the mods list now what that leaves us with is it's going to give us not the mod it's actually going to go ahead and give us the dictionary back so that looks like this so we might do to make this a little bit easier to understand we'll call them items again these variable names don't actually matter and to to the point where we left one as blah it really doesn't matter for item in mods okay we print the item get the same thing but what we're going to do is we're going to say hey you know what the mod is equal to the item and we're going to select the mod now that does get a bit confusing but what they end up looking like is print the mod we now have each of the mods so what that means is we can pass that mod into the mod activity so what that looks like is simple as mod and we're going to have something called maybe mod activity okay more activity is equal to mod activity which is a function and we're going to pass in the mod now that's all good and well that's going to go off and collect for us the um how many is it going to collect it's going to collect 500 posts for each but we want to make sure we are saving all that data for all four so what we are going to do at the bottom of that we're going to say all the activity is going to be extended by so we're going to add the lists together the mod activity list okay so what happens is we'll go ahead and try that and if that does error out what we'll say is you know what i think we've got enough from this this particular mod let's go to the next one okay so why don't we uh shift enter on that and we can obviously yeah speed this one up [Music] okay so that's gone ahead and run so what we can do now is we have all the activities let's go ahead and just first of all check its length okay so the length of all the activity and beautiful so we have two thousand we have two thousand um posts that are available to us which is really nice so what we can do is we can do some very cool stuff with that so what we might do is we might start off by just making sure we've got everything we're expecting really nice and if we look at the very last entry in this two thousand so negative 1 we have the very last mod so the first thing we can do is we could say and we should do this at the top of the screen we should say import pandas as pd okay inputting pandas as pd gives us the ability to say something as simple as the data frame df just a simple variable name not the best name in the world but you know it's going to do us just fine for now data frames equal to pd dot data frame case sensitive and what we can do is we can actually just pass in the list of dictionaries so run that now shift enter and what we have available to us let's give ourselves a bit more space here is we now have this data frame which you guessed it looks very similar to a excel spreadsheet okay we are running a bit short on time for this video if doing visualizations in python using matplotlib plotly seaborn all those types of things is of interest to you please leave a comment below but let's not let's not kill the video before we get an output so let's go ahead and go to underscore csv okay we'll call this output one because that's a great name for for a file name index equals zero i don't want the index that's just this one to the left here so shift enter on that and what i can do now is navigate to my folders and open this up uh in excel okay great we've got this open up in excel so what i can do is very quickly and look i'm 99 sure you guys are going to comment below and say yes adam we want to do this pure python let's let's do this but you know what we'll do for now is we will open it up in excel and what we can do is we can bring the mod in in here and here we can say hey buy maybe hour show me i don't know give me a count of how many um activities or when you're most sort of active so what we can actually show is the volume of activity by mod by hour now this is in utc time but what you can very quickly turn realize is that certain moderators have certain hours of the day where it appears they might be sleeping um so we're going to stop there for now but as you can see we've gone through and we have actually been able to tap into the reddit uh api which is really cool uh we've been able to do it smart and build some functions that show us how to first of all get the moderators of a subreddit and then get the activity of each of the moderators or any user you can just pass in any user here it doesn't have to be a moderator and then from that we've been able to create a big list of dictionaries and then from that we have actually been able to create a nice little data frame that then we can do some really cool and interesting things with uh if you've enjoyed today's episode please consider subscribing if you haven't already and a big thanks to all my subscribers and look if you have any questions leave them in the comments i try to get back to as many as possible if you're new around here check out some of my other videos there's more and more coming thank you so much
Info
Channel: Make Data Useful
Views: 5,323
Rating: 4.9540229 out of 5
Keywords: .py, best first language, best programming language, best programming language 2020, how to learn to code, learn python, learning python, py, python, python datetime, python for beginners, python programming, python projects, python scraping, python scripting, python tutorial, python tutorial for beginners, python3, pythonrequests, top programming language, why learn python, why python, why python is awesome, why study python, reddit api, reddit, api
Id: Mw-dsY8UKVs
Channel Id: undefined
Length: 37min 16sec (2236 seconds)
Published: Wed Jul 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.