Python Project to Scrape YouTube using YouTube Data API | Analyze and Visualize YouTube data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys i'm tawfiq in this video let's try to build a simple python project where we are going to write a python code which is going to access the youtube api and then extract the data from youtube will then load this data into a pandas data frame and then analyze it and then do some simple visualization using the c bond python library now if you are looking for a portfolio project for data analysis then this can actually be a pretty good starting project i wanted to build this project in a single video and that is why i think this video is going to be pretty long but i'll quickly go through the different sections that we are going to cover in this video i will start by showing you how you can create an youtube api key once you have the api key we'll then walk through the google youtube api documentation it's a documentation given by google where all the different resources and methods that can be used to access different data from youtube are mentioned and also some sample code is also provided so it's a great documentation i'll just walk you through that once you have that understanding we'll then try to write the python code to build this project the data analysis and the visualization that i'll be doing in this project is going to be pretty basic but of course once you understand how to extract data from youtube you can do your own analysis and you can create your own visualizations now again before i can begin if you like my videos and if you appreciate my work then please make sure to subscribe to the channel and give me a thumbs up thank you and let's begin okay so the very first thing that we need to do is to create our api key now in order to do that you need to go to your browser and then just type google developer console now this will show you a website like console.developers.com just click on that and you will need to have a google account so if you do not have that then you will have to first create a google account once you have that account you can just sign into that account then you would see a page something like this and if you have already created a project it would show up here but if not it would show a button something like create a project so the very first thing that we need to do is we need to create a project so you can either click on here or on the top you have a select a project option and here also you can create a project so let me just create a new project and i'm going to name this project something like let's say youtube analysis project okay and i'll just click create and this should take a few seconds okay so our project is now created the next thing to do is we need to enable an api which we want to use so either you can click on the button here or you can go to library in library you can search for the api that you want to use the api currently we want to use is the youtube data api and you can see here for youtube we have three different apis we will be using the youtube data api that is this one we'll click on that and we'll click enable so this will enable us to use the youtube api in this particular project okay so we have enabled api now the last thing that we need to do is we need to create our api key so go to credentials on the left hand side here and then here you have the option of creating a credential so just click on create credential and here you can either create an api key or the oauth client id we will not be using out for this video in this video whatever we are doing for that api key is more than enough so we'll just create an api key so the api key is just created and you can either copy it now and i'll just close this so this is where you have your api key and we will need this api key in order to access the youtube data so this is our first step and this is done the next part is to identify how do we use this api that is what are the different resources and methods that we need to use and how do we write our code in order to access the data from youtube using this api okay now to do that google has provided a documentation and you can find a documentation just by typing like youtube data api in your browser and here you would see a website like developers.google.com youtube just click on that and this should open the youtube data api documentation now here there are a lot of things and we will not be going to each and everything but we'll just go through the things that we need for this project so first let's go to references under here you would find different resources here so these resources can be used to access different types of data from youtube for example if you if i wanted to access all the channel statistics that is the channel name the total number of videos etc then i can go to this channels resources under this i have some methods like list so i can access all of the data by using this method called list how to actually use this method list in a python program it's also given by google so if i just go down here you can see that i can access this method or basically i can call this method by providing a channel id or i can also provide a youtube username or there are other options as well now under this if i just click on this code button here it would open something like this and here you would see all the different parameters that can be passed into this method so this part parameter here is basically something like an output parameter this is what will be returned by this particular method and we can also click on execute and just see what a sample data output would be we'll see that but before that on the right hand side here you would see for different programming languages google has already provided some sample code so if i click on python you can see that google has already provided some sample code that we can use in order to call this api and access a particular data so it's telling me that what are the different libraries that you need to import and then it's telling how you can what are the different service name what is the version and how do you basically authenticate your request and it's basically giving giving you all the different codes that you can use okay so we will be using all of this and we'll be referencing to this sort code when we are writing our program and also you have all the different parameters that you can use with this method is mentioned here so i said part is something like an output parameter and all the different things that you can provide inside part is mentioned here so if you want the content details you can provide this if you want a snippet you can provide this statistics will be under this etcetera and then there are also all the different parameters that are available for this method is all mentioned here okay so all the details is given here similarly if i wanted to access let's say all the video details and i can go to this resource videos and i can click on list and here again it will provide me the sample code that i can use to access videos from this api okay and the same thing goes for all the other details so this references basically will provide you all the methods and resources that you can use to access different sections of data from youtube the next thing let's look at is guides so if i come to guides here and you see quick starts you can see for different programming languages they have different sections if i go to python it will tell you all the prerequisites that you need so since we are going to be using api key for this we just need one particular library to be installed that is google api python client so we need to install this particular library in order to basically use our api key to access the youtube data there's also another method that is by using the oauth credentials but we will not be covering that in this video and similarly one last thing that we need to focus here is the quota cost for api request this basically every time you do a request for using this api there's a quota and there's a limit on this quota so i think every day you have a quota limit of around ten thousand request and different methods that you call will have different cost so for example if i call the list method so from the channels if i call the list method it will just cost me one quota but if i try to use some other methods let's say insert and then it will be 400 cost and you can see the cost of each request from this particular table the reason this is important is youtube has a limit on the number of requests that you can do per day i think it is around 10 000 if you exceed that request count then you will not be able to use the api or you will not be able to uh do any request for that particular day okay so i'll leave a link to this documentation website as well as the link where you can create your api key in the description below so you can go and check that out now let's get into writing our python program to build this project so since it's going to be a new project i am going to create a new virtual environment so i will be using anaconda for this i'm just going to say conda create dash dash name and i'm going to name this like let's say yt env and i'll say python version to be 3.9 okay so i'll first create my virtual environment and once this is done i'm going to install all the different packages that i need to work on this project okay so virtual environment is created let's activate that i'll just say conda activate yt env and let me clear this next let's install all the different packages the first package that we need is the google api package and i'm just going to say conda install google api python client okay so the next package that i need is pandas so i'm just going to say conda install pandas now since this is a virtual environment it will not inherit all the different packages from my base conda environment so i need to install all the packages separately in each of my virtual environment so panda is installed pandas and next we'll install cbon so i'll just say conda install cbon okay so cbon is installed and the last thing that i need is jupiter notebook so i'm just going to say conda install jupyter okay so jupyter notebook is also installed now let me open my jupyter notebook but before that let me go to the folder from where i want to be creating my jupyter notebook and here let me open my jupyter notebook so i say jupiter notebook okay so my jupyter notebook is open let me create a new notebook here and i'm going to name this like let's say yt and the lysis project so the first thing that we need to do is we need to import all the modules that we need so i'm just going to import one by one so the first thing is from google api client dot discovery import build the next thing that we need is pandas import pandas as pd and finally we need c-bond so i'll just say import c-bond as sns okay so the libraries that we need we have imported them now in this project i'm going to have two parts the first part is i'm going to extract the data for channels for some youtube channels so i'll access their channel name their total number of videos total views they have got and total subscribers and then we'll try to compare a few channels we'll take few channels and we'll try to compare their data and we'll see how the growth of these channels have been and the second part we will try to extract all the videos from a particular channel so we will take all the video data that is their video views the title of the video the total comments they have got likes dislikes etc and then we'll try to analyze and visualize them okay so let's start by first extracting the channel details and then analyzing and visualizing them so the first thing that we need is the api key so i'm going to create a variable called as api key where i'm going to store my api key i'm going to get this api key from basically the google developers console website where we actually created our api key and that is this one i can just go here and click on the copy api key and let me just try to paste it here the next thing that we need is since we want to access the channel details first we'll try to access one particular channel detail and once we have written the code to do that we'll then modify that code to access multiple different channels okay so initially let's just provide one channel id so i'm just going to create a variable like channel id and i'm going to have it passed here now how do you get a youtube channel id so for example if i just go to youtube and if i just go to my channel so i'll just go to my channel i'll say your channel and then here and here on the top okay let me just pause this and on the top here you would see basically the channel id that is after your channel slash this is basically the channel id so i'm just going to copy and paste it here so this is my channel id but let's say if you wanted to access channel id of some other youtuber so for example let's say i go to mkbhd and if i search for mkbhd and if i go to his channel you can see that on the top in the url you would not see a channel id but this is basically the custom url that is provided for this particular channel so in this case if you do not find a channel id here then i have a hack what you just need to do is go to any of their videos so i'll just click on this video here and i'll just mute it and then i from this video section i'll then click on the channel name i'll click on this channel and now if you look at the url here here this is what you would find the channel id so the channel id is basically some alpha numeric values so i'll copy that and that is what i can use so if you want to find the channel id this is the hang that you can use okay so once i have my channel id so the next thing that i need to do is i need to get the youtube service based on that youtube service i will be able to raise a request to the api to get the data that i am looking for okay so how i can do that is already given by google so if i go to uh their documentation go to references for example let's go to channels go to list and then go to this sample code that they have already given here and then go to python and on the below bottom here you they will tell that their api service name is youtube api version is v3 and then they are using the oauth credentials here but we do not want to do that we want to use api key but i am going to tell you how to do that basically the command that you need to use is this one so you need to say google api client dot discovery dot build and then pass in these three parameters the first the service name the api version and then the credentials okay and then you can call that request basically mentioning the resources and the method and then you will get the response which you can store okay so we'll do all of this in my code so the first thing that i need to do is to get the service so i'm just going to say youtube equal to and i've already imported the build so i'll just say build and then i just need to pass the three parameters the first is a service name and i know the service name is youtube as it is mentioned here the service name is youtube version is v3 so i'll just say youtube comma v3 and then i need to pass my api key so to pass the api key i can use the parameter like developer key which is equal to my api key which i have just mentioned here okay so this should hopefully create a service and i'll be storing it in my variable youtube and i'll be using this in my code below okay so let me just execute this okay so we have created the first part of our program we have imported the libraries that we need and then we have set the api key and the channel id and we have created the youtube service the next thing is to extract the channel details so what i'm going to do is i'm going to create a function which is going to extract the channel details and then we'll try to analyze and visualize it okay so in order to create a function first i'm going to just name it like function to get channel statistics and this will be a markup so i'll execute that and then i'm going to create my function as let's say get channel stats and i need to pass in a few parameters first is my youtube service which will be youtube and then my channel id which will again be my channel id and then here this is where i'm going to write my code which is going to access the youtube api okay so if i just go back to the youtube documentation under channels here so i'm already under the resource channels and i have gone in method list you can see that how do you basically raise a request is just by calling this particular code here so you just say youtube.channels this is your resource name dot the method name and then you pass in the parameters okay so it's all mentioned here so we're just going to be using the same thing so i'm just going to say request equal to youtube dot channels and then i'm just going to pass list and here i need to pass in my parameters so the first parameter is the part parameter which is basically the output that will be returned from this request and here let's say i want to extract snippet content details and statistics okay and then the next thing is i need to pass in my channel id which will basically be id and i'm going to say the same thing that is channel id which i just passed in as my parameter and now in order to execute this request i just need to say request dot execute and then it will return a response object which will basically be a dictionary which i'm going to store in my variable like response okay and then finally i'm going to return this response okay so i think that's all let me just try to execute this function and see if it's working fine and i'm getting an error and i think sniper so i think the there is a typo here it should be snippet now let me execute it so it's working fine i'm getting some output so let me just try to read this output so which is basically a dictionary now since this dictionary is similar to a json format i'm just going to open a json formatter here so that we can read the response object more clearly so let me just paste that output here and click process and this is basically my response object so you can see that initially i have some data here page info etc and then i have an item item is basically a list and inside this list i have few information like i have snippet content details and statistics now inside my statistics you can see i have the view count that is a wave count of the channel the subscriber count and the video count and under snippet you can see that i have title which is basically my channel name okay so the information that we are trying to extract here is we need the channel name and then we need all this statistics that is the video count uh the subscriber account etc okay so let so let's try to write our code to access these details so i know from the response object everything is inside items which is a list and this list only has one particular element as of now so let's okay let me just close this youtube and open this and here okay so let me just clear this particular output so just so that it becomes more clear and inside my function now i'm going to modify this function such that i'll access the data that i need so for that i'm just going to say data equal to i know that it will be inside my response object and inside my response object it will be the key that is items and inside items i am i know that it's a list and it's only having one particular element or value as of now and you can see that so this item as of now just has one element and it's mentioned here as well so in order to access the first element of this list i can just use the index 0 and then from there i can access all the different details so from items i can go to snippet and then from snippet i can go to title so i can just say from items i can go to snippet and then from snippet i can go to title correct so let me just see that so it's title here that's fine so this will basically fetch me my title but i also want to fetch other details like subscriber count view account etc so what i'm going to do is instead of storing all of them into separate variables i'm just going to create a dictionary where i'll be storing all of this key key value pairs okay so i'm just going to open a dict and here i'll just close this and i'll create another item here so this first item would be my title and i'm going to say this like let's say channel name channel name as this one next key in this dictionary would be like let's say subscriber count so subscribers equal to so again it will be the same thing that is response items 0 but after that if i go back here my subscribers is mentioned inside my key statistics so inside statistics it's mentioned inside this subscriber count so i'm just going to use that so items 0 and here i'll say statistics and then here i'm just going to pass the subscriber count okay so the next thing that i need is the weaves so i'll say weaves equal to so again i'm just going to copy this i know everything is inside in fact i can copy the whole thing here i'll just place it here and instead of subscriber count i can just say view count okay so i'll just use vf count and then again i'm just going to copy the whole thing here i'll put a comma here and now i'll say the next thing that i need is video account so let's say total videos okay and this total videos would be my video account so i'll just copy this and i'll place it here okay so i think with this i should be able to get my channel name subscribers views and total videos it will all be stored as a dictionary and to my variable data and then i'm going to return data okay so i'll run this hopefully it should return i'm getting an error let's see what the error i'm getting item so okay so it's not item it's items if i just go here you can see it's items i it's a typo again so i'll just change that to items and now let me run this and you can see it's working fine so i'm getting the output so basically i written a function which is basically using the youtube api to access the youtube channels resource and from there it's accessing it's basically calling the list method and it's extracting all the details that we have mentioned snippet content detail statistics and we are passing the channel id and for that particular channel id that we have passed in this case the channel id that we have passed is my own channel it's returning all the four different retail details that is the channel name the subscriber count the views and the total videos and that is what is written here now this is fine but i want to modify this function such that it's not just going to accept one channel id but it's going to accept multiple channel ids so in order to get the channel ids i already have basically collected a few channel ids these are few of the channels that i personally follow so these are all the channels i have luke i have kenji i have alex analyst i have tina hung so all these channels are basically related to data analyst and data scientist kind of materials so i i watched them and so i thought let me just analyze the data from these channels and let's see how how each of these channels have grown again if you want to get the channel ids i already told you how you can get that just go to youtube and click on their channel and on the url section you would find the channel id now i'm going to pass this as a list and it will be stored in my variable channel ids i'll just copy that now in this function instead of passing channel id i'll be passing channel ids now since channel ids is a list i cannot pass a list here so i need to convert the list into a string which will have values comma separated so i'm just going to use my join method to do that so i can just say comma dot join and i'll pass this so yes so just this much will convert the list into a string where all the values will be comma separated okay so this is fine now this is going to return me all the details of the channel and let me just show you how the data looks so let me just comment this and let me just return the response okay so let's just run that okay i did not execute this so let me execute this and let me execute this again and now okay so i'm passing the wrong name here it's not id but it is ids so this is the name that we need to pass channel ids i'll pass it here and now hopefully it should work so i'll execute this okay the issue is that i am not change the channel ids here so i'll change it and now if i run you can see that i'm now able to get all the different details from all of these five different channels so let me just copy the whole output here and let me just place it into this json formatter just to see how the data looks i'll click process and you can see that now i have this output here and if i just click on list you can see i have a few list here now so for each list let me just minimize this so let's let just to show you that we have five different details so previously when i just passed one channel id i got a list with just one basically item but now inside this list you can see that i have one two three four five because i have passed five different channel ids so for each channel id there is details stored in each of these list items so let's look at one by one so let me again go to the first one here again all the values will be stored in the same place so what we just need to do is we need to modify this function so let me just remove this output and so we just need to modify this function such that this response object now is going to return multiple channel details so we need to handle all this multiple channel details so let me just uncomment this so here you can see that in order to access the channel details of just one channel i pass the index as zero but now since there are multiple channels that we have passed to this channels.list method it will return multiple items so each item value will have details to one channel so what we need to do is we need to loop through each of the items and then we need to basically pass in the index here rather than hard coding this to zero okay so for that i'm just going to open for loop i'm just going to say for i in range of len of response and i know it's items and that's all so here i'll just indent this and now i'll just replace this index value with the index value here that is i so i'll just say i and here as well i'll just pass so basically what will happen is now every time this loop iterates it's going to fetch the details for one particular channel and it's going to store it into this particular variable that is data which is a dictionary but i need to store data for all the channels so what i'm going to do is i'm going to create another variable here which i'll say all data which will basically be an empty list and what i'm going to do is every time i get data for a particular channel i'm just going to append it into my list that is all data so i'll just say all data dot append and i'll just pass in data so this data here will get details of each channel every time it iterates to a particular channel and then it will just append it into my main variable that is all data and then finally i'll just return all data okay so if i run this now and now you can see that it's basically returning me a list of dictionaries each dictionary here is having data of particular channel now let's try to use pandas so that we can load this data into a data frame and then we will be able to easily visualize them i'll just name a variable like let's say channel status x equal to whatever is getting written from this particular function and this function i'll just pass it into my data frame so i'm going to create my data frame like channel data equal to pd dot data frame and here i'll pass my channel statistics and i'm getting an error so let's see what it is so it's telling channel statistics is not defined okay i need to run this and now i need to run this okay so let me now just view this data and you can see that we now have a data frame it's basically showing the data in a very tabular form which is much easier to weave than previously where we had the dictionary now this is fine but let's try to create a visualization for this so it becomes more clear and we will try to see different types of data now before we can visualize the data the first thing that we need to do is all of this if you look at the data type of this data frame this even though this is an integer here the data type would not be integer so for example if i just say channel data dot d types and if i execute this you can see all of this is an object we want to convert this into an integer that is subscriber views and total videos and then we will use them in our visualization so in order to convert them in integer what i'm just going to do is i'm just going to say channel data and i'll pass in each column so let's say subscribers so i'll say subscribers equal to i'll say pd dot 2 numeric and i'll again pass the whole thing here that is this thing so i'll convert this into numeric and same thing i'll do for all the other columns so i have views i'll convert views to numeric and then i'll convert total videos columns also into numeric okay and then let me see the data type channel data dot d types and if i run this i'm getting an error so channel date no it's channel data differ on this now you can see that the data types have now been changed so these have now become integer and this is now fine for us to do the visualization so in order to do the visualization i'm be i'll be using the c1 basically the c1 library so i'm just going to say ax equal to sns dot i'll be just using bar plot so i'll say bar plot i'll pass my x axis equal to so let's say i want to know who has the highest number of subscribers okay so my x-axis let that be as subscribers so i'll just copy this or not subscribers let it be channel name so my x-axis will be my channel name and my y-axis will be my subscribers and then my data will be this basically data frame that is channel data and now if i run this you can see that we have already able to visualize the data this is fine but it's a bit small so let's try to increase the size of this so i'm just going to say sns dot set i'll say rc equal to this one i'll say figure dot figure size and i'll okay i need to pass it in single quotes and i'll say equal to this one equal to let's say maybe 10 comma 8 and let's see okay so i think this is much better now you can see the data and you can clearly see that kenji is having the highest number of subscribers you can see he he has over 160 000 subscribers and next is tina who has i think over 140k subscribers and then alex the analyst has over 120k subscribers and look is almost reaching 100k subscribers and if you see my channel it's still a baby channel in front of all of this giant channels so this is more clear to understand what's basically happening with these channels now this is fine this is a subscriber count now let's see who is having the highest number of views so instead of subscribers i'll just print views now if i run this you can see that i'm getting the output and you can see now the views are almost the same for all of these channels so even the subscriber count you could see that kenji had highest number of subscribers and luke was having way less subscribers than kenji but when it comes to wheels you can see almost all of these four channels are having almost similar views and that kind of gives you an indication that you don't really need to have a lot of subscribers to have a lot of views now similarly let's say instead of views if i wanted to look the number of videos who has posted the total highest number of videos okay so my total videos is in this particular column if i just change views to total number of videos and if i run this you can see that kenji has posted highest number of videos but when it comes to luke has hardly posted 50 videos but with just 50 videos he has so many views that is almost equal to channels with way more subscribers than him so this kind of gives you an idea that where each of this channel stands when it comes to number of videos they have posted the total number of views and total subscribers they have okay so this was the first part of our project where we have extracted the channel details for a few channels and then we have visualized them now the next part is we'll try to access videos from a particular channel so let's consider kenji's channel he is the biggest channel in this list that we have so we'll try to access all the videos from his channel and then we'll try to analyze them now in order to get the video details we first need to get the video id so we need to get the video id of all the videos in his channel and once we get the video id based on that id we can extract all the details of that video now in order to get the video id if i go back to the response object okay which i think i have already pasted here that is the response object that we got from this channels list method and if i just go here to content details you can see that i have something like uploads this basically is a playlist id this basically the playlist where all the videos from this particular channel will be posted so if i just provide this playlist id i will be able to extract all the videos from that channel for example if i have to just show you so let me just copy this and let me let me just see whose channel this is so last one is the kenji's channel and here i just want content details and i need this id so this is the playlist id for kenji's channel so if i just go to youtube and search for kenji's channel so let me just search for this channel that is kenji and then let me just go to his channel go to his playlist and let me just open any playlist so here if you see the url on the top here the last id here this is basically the playlist id so i'm going to remove that i'm going to paste it with the id that we just got from our json and i'll just click enter and now you can see that it it's showing that this is all the uploads from kenji and he has totally 206 videos so this will basically show you all the videos in his channel so we will be using this upload id so the first thing is we need to store this upload id in our data frame so i can see that this is under items it's under content details it's under related playlist and then under uploads so i just go back here and i have created channel channel name subscribers views and total videos i'm going to create another column here which will basically be my playlist id so i'm just going to say playlist id equal to i'm just going to copy the same thing but instead of statistics it will be channel content details so i'll say content details and then here next it will be so let's go back to this json formatter it will be this one that is related playlist so i'll just copy that and then here again this will be let's say uploads so i'll just pass in uploads so hopefully this should get my upload id and store it as a column here as playlist id so if i execute this and if i execute channel statistics and if i execute channel data now you can see that i am able to get a new column which is playlist id so basically for each channel i am able to get the playlist id i can use this playlist id to extract all the videos from this channel so the very first thing that we are going to do is basically fetch all the video id so i'm going to create a function which is going to fetch the video ids for a particular channel once we have the video id we will then be able we'll then write another function to extract all the details for each of these videos okay so the first step is to write a function function to get video ids okay so this will be a markup and markdown and then here i will say def get function ids oh sorry not get function but get video ids okay and i'll pass in two parameters here one is my youtube service and the second is the playlist id so i need the playlist id so okay so to write this function what we are basically trying to do is we need to get the video ids for a particular playlist so if i just go back to my documentation here that is a youtube documentation and look for playlist items resource go to list and this is a method and you can see that in this method the first para part parameter here i think it did not have the code snippet mentioned here but that's okay we still know that the resource name is playlist item i is caps here and we can get all of this output snippet id status content details and the parameters we need to pass here is the playlist id okay so let's try to use this particular method so i'll just copy this because this is the resource name so i'll just go back to my jupyter notebook and here i'm going to say request equal to youtube dot playlist items dot list and here i'll pass in part equal to so the part that i want to extract is just the content details and then the next thing that i need to pass is the playlist id so i'll just go back here and i'll just copy this and i'll just come back here here and then i'll just say this is equal to playlist id okay so i've got my playlist id and next let's just execute that so i'll say response equal to request dot execute and i think okay so for now let's see what this will return so i'll say return response and let me just execute this and let me just call this function let's just see what happens okay so i'm getting an error playlist id is not defined okay so we do not have the playlist id so i have playlist id in my data frame that is here so what we are going to do is let's try to access the playlist id of course i can just copy uh from here and pass it to the function but i will not do that rather i will write a simple way to extract this playlist id from this data frame and store it into a variable okay so what i'm just going to do is i'll just go here and here i'll create i'll insert a new cell here and let me just print the data that is channel data and then here let me okay let me just try to get my playlist id play list id equal to channel data and i'll be using the loc method so i'll just pass in loc and then i will pass in the whole thing that is channel data and then i'll say my channel name so the column is channel name so whenever this channel name double equal to let's say kenji and that is what i want so for kenji i want to access the column is this one that is playlist id from here i want to access the iloc that is and i'll say the very first value okay so let me just print this so let's see if it's actually fetching the playlist id and i think this playlist id is of kenji yes so using this particular method using loc and iloc here i am able to access the playlist id for a particular channel okay so then i can just pass in display list id so i don't need this i'll just remove that and i'll pass in the playlist id and i'll run this so i'm still getting an error like resource [Music] so what is resource object has no attribute playlist item so if i go back here i think it is not playlist item but it is playlist items so i just need to fix this typo so i'll say playlist items now i'll run this and i think it's working fine so now this function has written so what we wanted this to return is the video details so if i see here okay so you can see that total results are 206 but results per page is only five so this is one thing that you need to remember whenever you do a request youtube will always by default return you five details okay so inside your items you generally have five elements only okay whether it's channel details or video details or anything so if you wanted to have more than that then i can pass in a parameter like let's say max results equal to 50 and if i run this now i think it's yeah now you can see it's returning more values that is 50 but if i still go to the bottom you can see that the total results here are 206 but it's returning only 50. so even this is not going to work for us so what we need to do is we need to modify our function such that it's not just going to extract 50 video details but all the video details from this channel so we'll try to add a logic to do that but before that let's just analyze this output that is what we have got here so let me just copy this whole thing and let me just paste it into our json formatter so let me just go back to json formatter here and where did it go okay this one and let me just process it and let's see the data so i have items and yeah so i have 50 different items now let's look at the first item here and first item we need the video id so you can see under items i have content details under content details i have the video id so this is what i want from each of this response item of item elements okay so let's try to modify our program but first even though this is only fetching 50 video details let's try to extract the video id for this 50 okay so what i i'm going to do is i'm going to open a for loop so i'm just going to say for video or let's say i'll say for i in range of len of response and i say items okay so for each item so we know that there are totally 50 values in this item so for each item i'm going to extract the video id so what i'm going to do is i'm going to create a list i'll say video ids and i'll assign it to a blank list for now and here i'll say video ids dot append so that for every video detail it's going to append the video id so i'll just say i'll just copy the same thing that is response item so i know it's under response items and under items it's under content details and then it's video id so i just say content details and then i need to say video id so i'll again go back here i'll copy this hopefully this should return me the video id okay so this should hopefully have after this iteration it should have 50 different video id so i'll just return that and instead of returning this i'll just return the length of this okay i'll run this i'm getting an error item so it's not item it's items so let's fix this and run this again i'm still getting another error listing must be an integer oh okay so i need to pass in an index so because it's a list so it will be i so for each iteration so now if i run now we can see the length is written as 50. so basically this video id is a variable which is a list is having 50 video id so this is fine now it's able to fetch 50 video ids but our intention is to not just fetch 50 video ids but to fetch all the video ids from this channel so for this we simply write a simple logic but before that let's just look at this written object here so if i look at this i have items but on top i have something like next page token so the thing is if you look at the page info it's telling total results 206 results per page 50 that means there are more pages in the response and it's just not returned as of now so but it's returned a next page token so using this next page token we can access the next page result and then we'll be able to access all the different data so basically whenever there is less than 50 data 50 records or 50 items return then you would not have this key okay so we can use this key to identify if there is a next page or not okay so we'll try to write a logic for that so what we are just going to do is we'll go back here and first i'm just going to say next page token this is a variable i'm going to create and i'm going to say it's going to be response and i think if i just go back here it is just this key so i'll just copy this i'll paste it here so this is my next page token and then i will say whether there are more pages or not so i'll say more pages equal to true by default and then i'm going to open a while loop where i'll be looping based on the more pages flag so if there are more pages then i need to loop if there are no more pages then i will stop looping okay so i'm just going to say here like if next page token oh sorry if next page token is none okay so one thing this next page token will not not always be present in the response object because if there are no next page then we would not have this item in that case if i try to access the dictionary with this method it would throw me an error so what i'm going to do is i'll use the method called as get and just fix this so what get will do is if this item or this key is present then it's going to return the value of this key if this key is not present it's going to return none okay so i'm just going to check if this value here is none that means there is no next page then i will change more pages equal to false if not i'm going to look through or basically do another request basically call another request in order to fetch all the video details okay so i'll again copy the same thing i'll pass it here and i'll just do an indentation here and just one additional thing that i need to pass is the next page token so in order to pass that let's just go back to our documentation and here i will see the parameter i think okay so that is this parameter here page token so i'm just going to use that okay here and i'll pass my next page token so that is this one so it will go to this particular page and fetch 50 videos okay for this particular playlist so it will return into this response object and then again we are basically going to use the same for loop so i'll just go here paste it here and here okay so response items so yeah this should hopefully return all the remaining 50 videos and in the next step i want to change the next page token so after this page has been processed the next page token should again change because this response object would return the new next page token that is if it exists i'll just say this one and here i'll just pass the same token that is this one okay but i'm going to use the get method so i'll say get this one so i'll get the next page token here and if there is a next page then there will be some token and so this will not be none so this will again process it will again come to the else part it will run the remaining it will extract the remaining 50 video details and it will again append it into my video ids variable if there is no next page then this would return none so this will become none so here this will be true and more pages will be false and we will come out of the while loop okay so this is basically the logic and hopefully at the end we would have all the 206 video ids in my variable video id so if i execute this now hopefully it should return 206 let's see and 206 because i know kenji's channel is having 206 videos okay i'm getting some error here no root to host something is wrong i think the connection got lost let me try to run this again okay so i now it's fine so it's returning me 206 which is basically the total number of videos that kanji has so we have got all the video ids and let me just store it into a variable like let's say video ids equal to this and if i execute and if i just show you how what this looks like okay i don't want to return just the length but i want to return the whole list so now if i just execute this and you can see now video ids variable is having a list of all the 206 video ids so we have been able to fetch the video ids the next thing that we are going to do is we are going to write a function to extract details from each of these videos so i'm just going to say and i'll just run that and we'll create our new function that is okay and again i'll pass my youtube service and i need to pass video ids so again now what we are trying to do is we need to based on these video ids we need to get the video details so if i go back to the documentation and now if i go to the resource videos and go to list so i want to access the list by video id so multiple video ids so i can use this so if i go here if i go to python and you can see that it's telling me i can use the method the resource videos and then pass in list and i can just pass in all the videos ids here comma separated video ids so i'll just try to use that so again i'm just going to say request equal to youtube dot videos dot list and here i'll pass in all the parameters so the first is part equal to and what i want to fetch is let's say snippet and statistics and then i need to pass in the id which will be my comma separated video ids now this video id's here is a list so let's try to convert it into a string which is comma separated string so i'll use the join method so i'll say join video ids and that is fine now i can just get the response so response equal to request dot execute actually this is going to throw me an error and i'm going to tell you why response and let me just execute this and okay so let me just execute this method okay so i'm getting an error and uh okay okay no root because i think i'm having some issues with my internet that's why i'm getting this error so let me run this and i think now i'm getting some other error yeah now basically it's link invalid filter parameter now this error is because this video id is here i know that it's having 206 ids and the limitation with the youtube request is that we can only pass 50 ids at a time so i cannot pass 206 so let's say if i just change this to let's say maybe up to 50 again if i run this now now i think it should work fine so you can see now it's working fine and it's i'm able to get all the details okay so for now let me just copy all this data okay so that is a 50 video details and let me just place it in my json formatter here and run this just so that we can see the data i have the response object here so the video data that i am trying to get is i want to get the publish date when the video was published the chan the video title and that is inside snippet and then inside statistics i have the view count like on dislike count comment count etc okay so all of this is enter inside items so we will try to use this to extract all the video details but before that we also need to modify our code such that okay but before that let me just clear this output here just so that it becomes more clear so we need to modify our function here so that because currently it's only able to fetch 50 video details that is we are only able to pass 50 ids we need to iterate this request such that we will iterate it until it passes all the 206 video ids okay so for that what we can just do is i'm just going to open a for loop i'm just going to say for i in range and this range what we are going to do is we will start with 0 and we will go up to the length of this video id so that is len of video ids but it should process 50 videos at a time correct so that is what we want so for every request i want it to fetch 50 videos so by passing the range like this what's going to do is going to start the range by from 0 and it will go until the total number of videos that is 206 in this case but it is going to take 50 at a time so it will the first value will be 0 the second value will be 50 third would be uh 100 and 150 200 and then the last would be 206 or something like that okay we need to now use this index so here uh i'm going to say that index i and then i'll say i plus 50. so what happens is during the first iteration i will be 0 so we will fetch from 0 until 0 plus 50 that is 0 to 50. we are basically slicing from 0 to 50 so it will fetch 50 ids from this uh basically variable in the second iteration i will be 50 so it will be 50 until 50 plus 50 that is 100 so 5200 so it will fetch the next 50 and the same thing will continue until we fetch all the video ideas from this particular variable okay so hopefully this should solve the problem so just that this request will be called multiple times and for each time it's going to fetch details for the 50 video ids only so every time we call this request we need to now store this video details okay so i'm just going to create a variable here let's say all video stats and i'll say this as blank list okay and then here below inside my for loop itself every time i call this request i'm getting a response of 50 video details i'm going to open another for loop here because this response object will have details for 50 videos so i need to loop through 50 times to get details of each video so i'm just going to say for video in response items and i'll close this and here i need to get my video start so i'm just going to again like what i did in my channel channels statistics video i'll open a dictionary so i'll say video stats equal to dict and inside this dict i will store all the different details that i need okay so the first detail that i need is the video title so i'm just going to say video title equal to video and where do i get the title so for that let's go back to our json and you can see that i have this items here and if you see i have multiple video details here so inside the first items i have snippet and then i have statistics so inside snippet i have this published the date of publishing and then i have the title so i just need to go to items snippet and title so i just go here so i'm already inside items from this loop and then from here inside this i just need to go to snippet and then i need to go to title okay this should hopefully fetch me the title of that particular video and it's going to return it into this particular key that is title and then in my next key i'm going to create let's say published date and which will again be the same thing here but i need to pass the published date which will be this one so i'll just go back here i'll paste this and then in the next one the next thing that i want to count is all the different statistics so if i just minimize the snippet and the statistics so i have let me just copy all of this so i don't have to go into this whole thing every time so i'll just paste it here so we can easily refer to this okay so the next key will be let's say views weaves would be video and then it will be statistics and then it will be inside statistics it will be view count right so that's it and then i'll just again so this will be for dislike like and comments so i'll just do the same thing for everything so weaves okay so i think i need i have all the information that i need so everything will be stored into this video stats variable so each dictionary and now for each after each iteration i want this to be appended into my main variable that is my list so that this list will have a list of dictionaries and each dictionary will have details of one particular video so i'm just going to say all video stands dot append and i will just pass this video stats okay so hopefully that should work so let me just remove this and then i'll just return this all video stats so hopefully this should return me 206 videos let me just check the length of this first let's see how it works okay so it's returning me 206 so i think it's written me all the video details that i wanted so let me just run this and you can see that for each video so it's telling me uh this is one of the videos my first data science project uh the total number of views the total likes uh dislikes comments and when it was published everything is mentioned and the same thing is happening for all the 206 videos so this is fine so we have got our video details for each of the videos from kenji's channel okay so what we will do next is let me just store this into a variable like let's say video details it will store whatever is getting returned from my get video details function and then i'm going to pass this into my data frame so i'm just going to create another variable saying that video data equal to pd dot data frame and here i will be passing this video details and hopefully it created my data frame and if i just run this now and you can see that we have used pandas to create a data frame this is basically the data that we were able to extract so all the video details from kenji's channel we were able to extract it using the youtube api and then we have loaded into loaded this into our data frame and this is how it looks okay now this is fine we'll now try to do some simple analysis and visualization on this data okay so the first thing is let's try to identify which are the top 10 videos of kenji okay so how do we identify top 10 videos is the videos where they have got the highest number of views okay so let's see how we can do that but before doing that again what we need to first do is we need to convert all these columns into numeric so this views likes dislikes comments we need to convert it into numeric and this publish date that i have here i would like to modify this so that it only shows me the date date value and not the timestamp okay so let's try to modify that so i'm just going to do that one by one okay so first thing let's try to do modify this publish date so it only shows the date value so i'm just going to say this equal to pd dot to date time and here i'll just copy the same thing and then i'll just use the method dt dot date okay and then the next step is i'm just going to copy this paste it here i'll convert views into numeric so i'm just going to say pd dot to numeric and then i'll just pass the same thing that is this whole thing here and i'll do the same for all the other columns so i'll just okay and then finally let me let me just print this so you can now see that i have this data and all the data has been shown here so i have the title then i have the publish date which is just showing me the date and all of this column the values are still the same just we have changed the data type so that this are now becoming integer okay now we can use this to build our visualization the first thing is i need to identify the top 10 videos okay so in order to identify the top 10 videos i'm just going to sort this data based on views in descending order so i'll just do one thing i'll just create a variable like which is equal to video data dot sort values by i'll say weaves and then i'll say ascending equal to false okay and then i'll just print this so okay so it's sorting the data but it's not just returning top 10 but all the 206 records so i'll just use a head here so i'll say head of 10 so now hopefully yeah so hopefully we have got the top 10 videos of kenji based on views and this looks fine but let's just try to put this into a bar chat so we will be clearly able to understand how each of these videos have performed okay so again i'm just going to use uh c bond and going to use the bar plot so i'll just say ax 1 equal to this time s n s dot bar plot and then i'll just say x equal to so in our x axis i need the views so i'm just going to say waves and in my y axis i need the title okay so i'll just say title and then i'll say data equal to the top 10 videos okay so that's all now let me execute and you can see that i'm able to see the data here so we have on the y-axis we have all the title and you can see just by looking at this graph that the video with the highest number of views basically it has got almost 1.1 million views here and this video has performed way better than any of his other videos so you can just see that one video has really blown up which has got a lot of views and the remaining top nine videos are are also performing well but not as good as the the very first video okay so this just gives you some indication of how his videos have performed in a nutshell okay this is fine now let's try to do one final analysis of kenji's data so what i want to do is i just want to see every month how many videos is he posting okay on an average for each month how many videos has he posted okay so we'll just be using the same data that is the video data so i'll just run that here again so just change this video data and we'll just use this data to identify for each month how many videos he generally posts and which is the month where he post highest number of videos and which is the month we he where he post the lowest number of videos on an average okay so for that the first thing that i need is i need to create another column where i will have something like month so then based on that month value i will be able to segregate the data okay so in order to create a new column in this pandas data frame i can just say video data and here i'll just say month and this one i'll extract this month from the publish date column so what i can just do is i'll just say pd dot to date time and then i'll just pass in this whole thing here but i'll change the month to publish date because this is the value that we need and from this i need to fetch dt and then i need to fetch the month date okay so i'll just say strf time i'll pass the format as b lowercase b and i'm getting an error let's try to fix that so what is the error video date video data and okay i'm getting another error with your date um and now let's try to run this hopefully we should have a new column video a month so and you can see that in our data frame now we have a new column that is month now the next thing that i want to do is for each of these unique months how many videos has he posted so i want to do a group buy okay so i'll do a group by to calculate the total number of videos in each month so what i'm just going to do is i'm going to create a variable saying videos or videos per month equal to so i'll say video data dot group by and then here i'll pass in the column on which i want to group by and then i'll say count the total number of records so i'll say pass size and if i just check this so you can see i'm getting the data for each month how many records are present in this data frame it's returning here that is fine but this is actually not two different columns it's basically something like an index so what i want to do is i want to convert this into something similar to a data frame so what i'll just do is i'll pass in a parameter here that is as index equal to false and now i'll run this and now you can see that this videos per month is now a data frame so this whole thing group by here returned a data frame which is having two columns month and size okay you ignore the first one which is basically an index for now so this is fine for each month i have got the total number of videos that he has posted now before i can do a visualization of this data i want to sort this data based on the month that is from jan to december so i'm i will be using the categorical index to do that so what i'm going to do is i'm going to convert this index into the month value and then i am going to sort by index so all the values will get sorted based on index okay now in order to use the categorical index i just need to have a sort order which is basically a list of values where which basically will identify how the sorting needs to happen so i have this already created for myself so i'll just use that and then i'll just run this the next thing is i need to create the index basically i need to change this this integer index into the values that is present in my month so i'll change my index to that so how i can do that is i'll just copy this and i'll say this dot index equal to pd dot categorical index and here the very first thing that i need to pass is basically the column that i need to refer so i'll pass in month because this is the column value that i want to be in my index and the next thing is i'll say categories equal to i'll say sort order that is the list that i just created on the top and this one and then finally i'll say ordered equal to true hopefully this should work and if i just see here yeah so now if i just do this one dot sort index hopefully my data will be sorted so now you can see that my data is sorted in the month um as per month so jan feb march until december okay now i'm just going to so overwrite the value of my videos per month variable or data frame with this sorted values okay so this is fine and the next thing is now let's try to do a simple visualization the visualization that i just want to do is i just want to see uh for each month how many videos has can g posted so i'm just going to use the same cbon library so i'll just say x2 equal to sns dot bar plot and then here i'll say x equal to so my x will be month so i'll just say x equal to month and then y will be equal to size and then my data will be equal to this particular thing videos per month okay so that's it so i have done the visualization you can see that just by looking at this visualization you can see that in the month of july this is basically the month where kenji post highest number of videos almost 25 videos during the month of july and then the least videos would be in the month of october uh that is around 12 videos so there's not a big difference here but on average you can see that july is where kenji post highest number of videos and october is when he post least number of videos okay so i think we are done so i hope all of this was useful but one last thing let's say if you wanted to load all of this data from your pandas data frame into a csv file then you can simply easily do that so let's say we have the channel data that we initially generated that is channel data or let's say we take this data itself that is video data and if i see that i have all of this video data here in order to move this into a csv i can just say video data that because it's a data frame i can just say 2 csv and then here i can just pass a csv name so i'll say video details of let's say kenji right so this is videos of kenji and i'll say csv and that's all now if i just go back to my folder here you would see that i have the csv file already generated with all the data so you can always move all of this data from data frame into a csv file so that's one thing so i hope all of this was useful you understood how to build this project i'll make sure to leave a link to all of this script in my in the description below so you can go to my blog and copy all of this script and i hope you'll be able to build much more analysis and visualization using the techniques that you have learned from this video if you like this video please make sure to subscribe and give me a thumbs up thank you and see you soon in the next one bye
Info
Channel: techTFQ
Views: 3,530
Rating: undefined out of 5
Keywords: python project, python portfolio project, portfolio project, youtube data api, scrape youtube, scrape youtube using python, youtube api, python project for data analysis, data analytics portfolio project, pandas, seaborn, pandas dataframe, data analysis, data visualization, data analysis project, data analysis projects for beginners, data analysis project python, data analysis project from scratch, web scraping, web scraping tutorial, python tutorial, scrape youtube using API
Id: SwSbnmqk3zY
Channel Id: undefined
Length: 69min 34sec (4174 seconds)
Published: Wed Oct 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.