Working with APIs in Python [For Your Data Science Project]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone it's nate here today we're gonna work with apis to collect data for your next data science project a few months ago i made a video outlining the only data science project you need to do and in that video i talked about the data science project from start to finish and at the start of that project is collecting data and i outlined and detailed how important it is to work with apis to collect your data for your data science project versus you know importing data from a csv into your notebook versus collecting data from a database so today i want to show you how to actually pull data from an api specifically using python to pull data from the youtube api and specifically using the request library that's found in python so i'll explain how to work with an api we'll pull the data we'll look through the json response that we get and then we'll save all of this data into a pandas data frame and in the next video i'll show you guys how to actually save the data that you've collected from the api to a database basically creating a data pipeline that you can continuously update and of course we're going to do this all programmatically with good software engineering skills so your code is concise and it's clear and it doesn't look like some 10 year old wrote it and an important note this video is not going to be specifically about how to work with the youtube api what i want to do is talk about how to work with apis in general so i'll just make sure that all of the libraries we're using and all the techniques we're using can be used for all api services and that's actually a big reason why i chose to use the request library okay with that being said let's start this project let's start coding before we go to the transition if you like this type of content please subscribe to this channel alright thanks guys all right guys so i'm gonna be working with google co-labs as you see here on the screen uh which is basically jupiter notebook so if you prefer jupyter notebooks you can work on that platform instead i just prefer collabs in this case because it's really easy to spin up and then all my work gets saved in my google drive so the first thing we want to do is let's just import the libraries that we're going to be working with so there are three libraries the libraries are the request library which is basically a library that is going to allow us to make api calls then obviously there's the pandas library because we're going to save our data into a pandas data frame and then there's a time library and i'm going to explain what we're going to do with that later on so the next step here is to grab our api keys like i said in the intro we're going to be working with the youtube api so we need a api key that will allow us to make these api calls to youtube so like i said i'm not going to be extensively covering the youtube api because i want this video to really just be talking about apis in general and how to work with them and collect data from them but if you want to follow along with me and you want to actually pull data from the youtube api there's this resource here on the web that i'll link in the description basically it tells you how to get an api key from youtube you need to basically create or log into your google account go to the developers.google.com site and then you create a new project and then you can grab an api key from that project it's actually a very simple process so once i have my api key i just put it right here and i'm saving that api key in a variable called api key now what i want to do in this project is grab all my videos that i have in my channel and then what i'm going to do is just grab the metrics from each video so metrics like view count likes dislikes things like that i'm gonna be grabbing from the youtube api so what i need is my channel id so i have my channel id right here it's actually very easy to to find your channel id alright so these two parameters will be used to make my api call i'm just setting up the variables right now so the next thing i want to do is show you guys how easy it is to make an api call and to grab data from that call all right so we're gonna do a test run here so let's quickly make an api call so i'm gonna save the the output of that call into a variable called response i'm gonna be using the request library that you see up here import requests and then i will use the get method the get method basically is a function that will make the api call and grab data from that call so what i want to pass through this get method is the location of that api call of that api which is basically just the url and for testing purposes just to show you guys how this works we're gonna grab data from the github api so the location of the github api is api.github.com this is passed through as a string in single quotes here and then i'm going to add this json method to the end of this request which will basically return a json object in the response so what is a json object what is a json file it's basically a very popular data file sent over as a javascript object and it contains all of your data usually in an attribute value pair and i'll show you what that looks like after we make that response but json objects are basically a very popular and common data file type you definitely need to know how to work with these json objects all right i'm just going to run this code here all right so let's see what that response looks like this is the json object here it starts with a curly bracket curly brace and what you have are attributes or keys and then the value of that attribute or key so this is the data that we get making an api call it's really that simple most of the work with dealing with apis is actually passing through the correct url with the right parameters we actually don't have any parameters here and i'll show you how complicated and difficult it could be working with these parameters and second the other hard thing about working with apis is parsing through the data that you collect and then saving it in the right way that actually can take quite a bit of time and it could be a little bit difficult all right so i'll also show you how to do that when we work with the youtube api all right so this information is not really interesting again it was just for testing purposes all right so now let's work with the youtube api so what i'm going to do is just delete this and then instead of just typing in the url here we're just going to have a variable called url and i'll build the url in this url variable here all right so the first thing and what makes working with api is kind of hard is that there's a lot of documentation that you're gonna have to read through to be able to build out the url and grab and collect the data that you want so because we're working with the youtube api i have the youtube data api reference document here and so after reading the entire document um i am going down here to the left i'm going down to search and then if i go to overview here or if i go to lists it shows me the http request get url that i need right so basically this is the url that we're going to need to make the api call all right so one thing i'll do just to start it off is copy this and put it into the url variable here but that's just the root of the the url we need to build parameters because what i'm trying to do is find the videos in my channel basically list them out and find the view counts the number of likes dislikes all of that information so that i can save it into a pandas data frame so this link isn't really going to get me there i need to specify specifically what i want and what kind of data i want from the api and so in order to do this again we need to go through the document but one of the parameters that we need to pass through this url up here is this part parameter and then there are a few other optional parameters like channel id which will be my channel um maximum results the ordering of the results itself and the page token so after reading all this i kind of have figured out these are some of the parameters i need but again like i don't want to make this video just about working with the youtube api so whatever api you're going to be collecting data from what you need to do is read through the document and figure out what parameters you need to pass through to collect the data that you want so this is what it looks like when you have a properly built url all right so let me go through this one by one again we have the root url here we are passing through out to the right of this question mark we are passing through the api key so i have key equals and then my api key here so i'm basically authenticating into the youtube api i have my channel id so i have this ampersand sign here so it's uh basically my api key and my channel id that's going to be passed through and so this variable channel id corresponds to this variable here and then when we went through the document i showed you that there was a parameter called part and so what i'm asking for is the snippet information and the id information again if we go back to the document the snippet information here and the id information we will get all of this information back video id channel id playlist id publish that channel id title description thumbnails channel title all that information will be collected so that's basically what you're building out in this url here you're you're specifying exactly what you want to collect from the youtube api and then i have just some optional parameters just to make things easier for me i'm adding this sorting parameter here because i want to sort by date i want to sort the results by date and then i want a maximum result of ten thousand so because i don't have ten thousand videos obviously but i'm just putting a big max there because i just want to collect all of the videos and i want to ensure that i'm not limited by the maximum result parameter and then i have this page token here because when i read the document it basically said that this search might have several pages and so the page token will then allow me to go to the second page grab all that information and then go to a third page and grab all that information until all of my videos have been collected so the last thing i forgot to do is just add the page token and what that parameter is for now it's going to be blank but later on i'll show you actually how to use the page token parameter all right so i have the url all built out now i'm passing it through to the request library here the get method has this url has this entire url here and again we are specifying that we want a json output and everything gets saved in the response all right so now let's go through the response all right so let's go through the response which is a json object to try to understand what it's giving us right because we're gonna have to save this data somehow so if we go through this response it opens up with a curly brace curly bracket you see this e tag here which is basically again this is an attribute and then or a key and then this is the value for that key or attribute and so now we have items here and then it opens up with a solid brace a solid bracket and then what you see is essentially all of my videos right you see um the kind here is a youtube video um this is the channel title my description for that video and then the title of the video itself right and then here is what uh up here again is the video id so the actual unique identifier for this video step-by-step approach to solving any data science sql interview question all right and so as you kind of go down it just lists all of my videos that i've published on my channel and so it's actually quite a bit i think i have i don't know like 95 videos or something like that and then as you go all the way down things should then start to close up so at the very end what we're seeing is the closing of the items attribute the closing of the item's key right it started off with a solid bracket opening up and now it's closing this so between the solid bracket contains all of my videos and all of the information that i want it's in the items key so what i've identified by scrolling down and reading through the json object is the key that i need that holds all of my video information and that key again is this items attribute items key so what i can do just to get rid of everything else is just say items and if i just look at the response for items what i get is the start of the solid bracket and then if i go all the way down i'm sure i'll get the closing bracket so this is all the information that i want so now let's start to parse through this information this array and grab all of the information we want and then save it to variables all right so because i have like 55 videos listed in this array i want to just go one by one and grab that video information and obviously i'm gonna have to write a for loop but before i actually write the for loop let's write what would actually be in that for loop so let's go through the first video which would be in position zero here so i just called the zero position which is really the first position and it gives us the very first video that i have the latest video that i have right step-by-step approach to solving any data science interview question so the first thing i want to save that i'm interested in is this video id here i actually want to save this value here so how do i do that how do i navigate to this value so in order to do that i think what i need is to specify that i want the id so this will take me to the second row here and within this id key the second row i just want the video id so i'm going to specify that as well so video id and then that actually should give me the value so if i run this i should get this value which i do here so that's how you basically navigate through the array that you have up here so i'm just gonna save this into a variable called video id so if i run this again it should give me exactly what i expect the video id for this video here alright so that's the first thing i want to grab i also want to grab the title of the video which is right here it's saved in the snippet key here so what i'm going to do is now just specify the snippet key and then the title key and then grab this title here so that's going to look like this and so if i output the video title i get something that looks like this right that's exactly what i want and so just because i've built this project out before some of my videos have this symbol here i don't really want that saved so i'm just gonna do a replace whenever there's this symbol just replace it with a blank all right and so turn this whole thing into a string as well it's not gonna really do anything to this video title but if i run it you should see the exact same title and so the next thing i wanna do is grab the publish date or the upload date and that is published at and that date is located uh under this key published at all right so exact same technique we're going to just traverse through the array until we get the value we want you see here that i have the date and then i have a time stamp and then a time zone so what i'm gonna do is just pick or just save the actual date and throw the time zone away so what i'm gonna do is use a split method to just split on the t and save everything to the left so that looks like this split on the t save the left hand side all right so now if i run this i get adjusted date so that's basically how you save all of the information so i've just also consolidated the code a bit so that it's all in one block so the next thing we want to do is actually collect all of this information for every single one of these videos we have all of the videos in this array here in this response items so this is video one this is video two for example so we need to go through each video and collect that information in order to do that we need to create a for loop so that's relatively easy what we can do is just create a for loop here where we have four video in response items response items again is essentially the array of the data that we have video is a placeholder variable where it's going to go through each video in the array and grab that information so most of the work has already been done here what we need to do is replace this which is the actual video with this placeholder variable in the for loop called video so if we do that we basically have what we're seeing right now just video here as it goes through the for loop the last thing i want to add in the for loop is just some logic that will pick out youtube videos so kind equals youtube video because i want to make sure that the information that i'm collecting is a youtube video and it's not something else like a search result or or whatever it could be so what i'm going to do is add this if statement here that basically will look at kind and make sure that the value in kind is youtube video just like what we're seeing up here so after just refactoring the code so that it's properly formatted this is what my final for loop looks like so the last thing i want to do is just test out this for loop i'm going to print out the video id the video title and the date just to make sure that it works so if we run this for loop we should basically get the video id here the title of the video and then the date it was uploaded and we'll do that for all of the videos that i have in my channel alright so it looks like it actually worked so collecting all this information is great but it's not very interesting so what i'm also interested in collecting are view counts uh likes dislikes comment counts for each video that i have right so in order to actually do that i need to make a second api call because the data that i've collected in this api call doesn't have any of those metrics that i'm interested in all right so so in the interest of time and in the interest of not having a lot of duplicate information i've already shown you how to make an api call so i'll leave it up to you to figure out a way to make the second api call try to make that api call in the same way i just did and collect all of that information so if you are successful in making that second api call it should look something like this the url i'm naming the variable url video stats for video statistics the url is essentially this string right here it's going to the google apis youtube version 3 videos section not the search section where we were at last time and then the parameters that we're using we're passing through the video id and then we are grabbing uh the statistics from part equals statistics and then we're passing the api key so in order for this api call to be successful we actually need to know what the video id is which means that it needs to take place or this api call needs to take place after we make the first api call right because we need to grab all of that video information so after we make that api call it's saved again as a json object and so if we are able to go through the json object through the array we can pick out view count the light count the dislike count and the comment count okay so all of that is saved in these variables here so what i'm going to do now is i'm just going to put all of this code in the for loop because i just mentioned that we need to have the video id first before we can make the second api call so it needs to be in this for loop and it's going to look like this all right so i have the second api call right here and then what i'm doing is i'm just printing out all of the outputs that i have here just to test that the for loop was actually working so if you integrated the second api call successfully your final for loop should look something like this what i'm going to do is actually delete these print outputs here so i'm going to have essentially seven columns in my pandas data frame because we have video ideas one title date view count like count dislike count and comment count so i want to save all of this information for every single video that we have so that's going to be seven columns so the first thing we want to do is actually create an empty pandas data frame with those seven columns so that would look like this we have again seven columns here they basically correspond to the seven variables that we've collected and we are naming the data frame df this actually needs to be executed before the for loop so i'm going to put it up here and then i'm going to run this so that we have an empty data set and if i output the data frame we have these seven columns without any data inside of it so now the question is how do you actually save the information save the data into the pandas data frame for that we can use the append method which will basically take all of this information we're saving and then append that or save that into the pandas data frame so we're gonna save the data in the pandas data frame at the end of this for loop here and it's gonna basically look like this so we have df is equal to df.append and what this append method needs is essentially the column name of the data frame and the variable or the data itself that we want to save so the first one is going to be the video id so what that's going to look like is this we're going to start off with a curly bracket and then we have the video id which corresponds to the column of the pandas data frame here and we need the passive value to this column which is saved in video id so i'm just going to copy and paste that here and then the next thing would be video title and then upload date view count like count dislike count and common count so if i do that it would look something like this here we have all seven metrics saved and the last parameter i'm just gonna pass is just the ignore the index of the pandas data frame so then i will close off this data frame here and so that's how you save all of this information to a pandas data frame all right so now with the for loop finished let's test this out i'm just going to run this code block here it'll take a few minutes and because we saved everything in the pandas data frame let's just output the first five rows of it and so now we have all of our data in this pandas data frame so we're essentially done building out this notebook that connects to the youtube api grabs data and saves it to a pandas data frame there's not a lot of code here but what we can do next is implement a good software engineering fundamentals and clean up this code a bit in particular what i don't necessarily like about this for loop here is that we are making a second api call to grab video statistics here and then processing that and at the same time we are also parsing through the data from our first api call and it's all kind of mixed into this for loop so so good software fundamentals shouldn't have this code kind of all mixed and jumbled together what we can do is break the code up into functions and that's what we're gonna do next so the first thing i'm gonna do is i'm gonna i'm gonna create a new code block right below building this data frame here and i'm just gonna take this second api call here where we're grabbing the video statistics and i'm just gonna put it up top here all right so i'm gonna now make this into a function of its own i'm gonna call this function get video details and pass through the video id the video id will come from my first api call here and it'll actually come from this video id variable here and then lastly add this return at the end because once we process the video statistics we want to just return all of the accounts back into this for loop here all right so what we can do here now is call this function in this for loop so we're going to call get video details and we're going to pass through the video id that we get from this line of code here we're going to pass it into this function up top so the output of get video details is going to be this these count metrics here so all i'm going to do is copy and paste these variables into this for loop here these variable names actually match up to the variable names that i have when i'm saving the data into a pandas data frame so everything kind of matches up okay so the last thing i want to do is create one more function with the first api call the first api call is at the top here right here and we're just going to grab this and bring it down to where our for loop is so it's going to be above our for loop so we're making the api call here and then we're grabbing all of the data in this response variable and then we are processing that data in this for loop here and saving everything in the pandas database that whole thing that whole workflow could be one function so we'll call this function get videos and all we need to do is just reformat the the code so basically what we need to do is just pass the pandas data frame into this function so that it can save the data here so this function is almost complete we are passing a pandas data frame into this function this pandas data frame is going to have all of our video information so we make the api call here so after this api call we jump right into this for loop and sometimes in my experience we jump into the for loop way too quickly before all of the data from the api is actually collected in the response so what i'm going to do is i'm going to make use of this sleep function here sleep method from our time library and i'm just gonna say like wait one second before you jump into the for loop and that will give it enough time so that all the data from the api makes it into the response variable here and the last thing we want to do is return the data frame and so both of our functions are created so with this function done we're basically done building out our notebook so let's test everything what i'm going to do is bring the the creation of the data frame down here this is going to be our main and so now i'm just going to call the function so this data frame is going to be created and then it's going to be passed through this function here and then this function is going to make the first api call it's going to process the video data and then once it grabs the video id it makes a second api call here get video details then we process all of the video details grabbing the view counts like counts dislike counts comment counts we save it all here in this first function and then we save everything into the pandas data frame so once we get that let's see what we get so we get a panda's data frame it has the video id the title of the videos in my channel upload date and all of the video counts so that's basically it that's how you grab data from an api and save it to a pandas data frame so as a data scientist you're going to be expected to know how to work with apis and grab data from apis and then work with the data that you have so to break it down in steps what we did today was work with the python library called the request library this is a standard library that can work with all apis and it allows you to make api calls so you made a call to the youtube api we passed a url to the api to specify exactly what data we wanted we had to read the documentation carefully to figure out how to build the url correctly we collected the data as a json object and we parsed through it saving all of the information that we cared about into a pandas data frame and then lastly we cleaned up the code and we applied good software engineering fundamentals the next step in the data science process is to take your pandas data frame and then upload that data into a database that's going to be the next video so i hope you enjoyed this video please like and subscribe to this channel if you like content like this if you have any comments or questions about anything we covered in today's video just leave a comment in the comment section okay so until next time have a good one guys you
Info
Channel: StrataScratch
Views: 105,700
Rating: undefined out of 5
Keywords: apis in python, api python, python api, python api tutorial, python api project, api python tutorial, data science, data scientist, data science project, data science api, data science python project, data science python api, google colab api
Id: fklHBWow8vE
Channel Id: undefined
Length: 28min 32sec (1712 seconds)
Published: Fri Apr 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.