Analyzing IPL Data using Python | Python Projects for Beginners | Python Projects | Great Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so hello hello hello so guys i welcome you all to this interesting session where we'll be analyzing um a lot of things on the ipl data set so this is the ipl season and india as a country is a cricket loving nation and one of the you know world's best tournaments is going on right now which is the indian premier league and uh we have a bit of data which we have got through an open source data set where we'll be analyzing how different teams have performed across all of the different seasons so before we start for the session um i'd like to inform you that we have launched a free learning platform called as great learning academy where you have access to almost 100 courses with respect to data science artificial intelligence machine learning and a lot lot more and once you complete all of these courses you'll get a course completion certificate which you can go ahead and add onto your resume or onto your linkedin profile which will be a huge value add to you guys and also all of these courses have been curated by industry experts so um also to the folks who are new to our channel i'd like to request you folks that you know um if you haven't yet subscribed to our channel go ahead and subscribe to our channel as it will encourage us to come up with more such live sessions on a regular basis and also we have around 74 people watching right now so if you could just quickly go ahead and hit the like button that will encourage us a lot so the youtube algorithm works in such a way that you know the likes actually matter so if you go ahead and hit the like button more people will know that we are conducting this amazing session right now and more people can join in and learn more about data analysis with python and learn how to analyze this ipl data which we have for most of you folks who are asking me to teach in hindi we will definitely have a session in hindi as well so that also will be done um what we'll do is maybe we'll schedule it for next week and since we have students across and across the entire country and also we have students across different countries so uh you know our preferred mode of teaching is english for this particular session but i'll take your uh you know i have considered your request and will definitely arrange a session the same session in hindi as well so um i see that we have around 81 people watching right now so i'll just wait for a couple of more folks to join in and also guys 81 people watching i quickly want the number of likes to be 81 because you know it will show us all of the love that you you know uh it will show us all of the love from your side that'll be amazing so please use hindi and english absolutely i'll try to use a bit of hindi in general that wouldn't be a problem right so what i'll do is i'll keep speaking with you folks and then between i'll quickly open up my anaconda and jupiter notebook so let me just quickly go ahead and do that so i'm just opening up jupiter notebook and in between if you have any questions with respect to python or data science in general you can go ahead and put it up in the chat i will definitely take it up make an ml project video on ipl score prediction um that's an interesting suggestion so if we are able to scrape any that sort of data then we'll definitely do a session on it and mostly um next week we have another session on ipl and we'll have a lot of sessions on cricket and ipl in general throughout this ipl season so all of you cricket flowers um you have big surprise in store because we are trying to make this you know we are trying to teach you a lot of data science and python related stuff uh with this popular sport called as cricket now i'm just opening up my code file just give me a couple of seconds for that so now what i'll quickly do is i will reduce my screen frame from osho let me just do it out over here all right and let me also increase this so now i guess you can see me in a smaller frame because we'll be doing a bit of analysis right now so just give me a confirmation if you guys can see the code file and if everything is fine we'll start off with the session um how long will the stream go the stream will be for 1r amit roth is asking how to be a sports analyst that's a very good question um so if you have followed movies such as let's say moneyball or maybe if you have uh seen this tv series called as inside edge what you do well you know what they do is they try to analyze um a lot of things about players because cricket is a thinking game isn't it when you're making a team it's very important to understand which batsman or which bowler would be the right fit for which particular position now let's say you might have a fantastic player so let's just say if you have um let's uh savage now savage is an absolutely fantastic player but he'll most probably play amazing when he's opening the innings but then again if you take sevag and if you send him maybe eight down or nine down as an analyst you're doing a job because that is not how you make a team so this is where sports and uh you know this is where sports analysis comes in and all of these ipl teams so whichever ipl team you talk about they would have a sports analyst with them who can help them out with the perfect team combination so i hope that answers your question can we get the csv file sure you guys uh i guess we'll add the link of the csv file in the description once the session is done great now so now that everything is set and if you guys can see my screen i'll just quickly start off with the session so first we'll have to import all of the required libraries so in python we have these libraries called as pandas matplotlib and cbon pandas is a library which is used for data manipulation then you have matplotlib which is used for visualization and then you have c bond which is built on top of matplotlib so here what we're doing is we're just giving an alias so we are importing pandas with this alias called as pd then after that from matplotlib we are importing pi plot with the alias plt then going ahead we are importing this library called as cbon and we are giving this an alias of sns now let me just go ahead and run this piece of code let me just wait till this code is compiled and uh let me just see what is there in the chat right now so could you zoom the screen a little bit absolutely your wish is my command and i hope you guys can see it properly right now so um seems like my system is a bit slow today so we have successfully loaded all of the required libraries once we load all of the required libraries our next task could be to load the file on which we'll be working everything what i'll do is i'll actually restart and clear output because you guys can see the output and i don't want you guys to see the output immediately now i'll run the cell again right so now that we have loaded all of the required libraries i'd have to load the file with the help of pd dot read underscore csv so this pd dot read underscore csv would help me load a csv file and i already have a csv file called as matches dot csv and thus i go ahead and store in this object called as ipl so now we have also loaded this object now if i would want to have a glance the first five records of this data frame i would have to use the head method so here i have ipl dot head and as you see this gives me all of this information over here where i have all of these different columns let me just give you a brief information about this entire data frame over here so we have this id which is the unique match id then you have the season in which season was this particular match played then you have the city in which city was this match played in you have the date so exact particular date on which the match was played then you have team one and team two so this match was played on um let's just say this as fifth of april 2017 and this match was between sunrises hyderabad and royal challengers bangalore so this column over here tells you which team had won the toss so between these two teams royal challenges bangalore had won the toss and after winning the toss they had decided to field and you have the result so you have three categories over here um we'll actually learn about this going ahead we'll leave this out for now then we have the dl applied column so let's say if there was a situation if there was rainfall and if the match was cancelled or maybe something else had happened and if that would lewis method was applied you would have a one over here so if dl method or the duckworth lewis method was applied you would have a one else you'd have a zero overshoot then you have the winner which would tell you which team has won the match so here between the match uh between royal challenges bangalore and sun rises hyderabad you see that sunrises hyderabad have won the match they have these two columns over here win by runs and win by wickets so here now if all of you i believe are cricket enthusiasts and that is why you're attending the session now when we say win buy runs so this would mean that a team batting first has won the match so let me give you an example so here royal challengers bangalore had won the toss and they had decided to field which would mean that sunrises hyderabad are batting first and let us see if they had given a score of 100 and royal challengers bangalore were only able to score 65 runs and this is when we say that sunrises hyderabad has won the match with 35 runs because there's a difference of 35 runs and when we talk about win buy wickets so let's say the team batting second has won the match that is when will have some value over here in win buy wickets and in this match between royal challenges bangalore and sunrises bhairava you had jurad singh getting the man of the match award then you have when you the stadium at which this particular match was played on then you just have the names of empire one the second empire and the third compact so this is a brief information about all the data which is there on this data set let me quickly go ahead to the chat and see if there are any more questions so guys quickly again we have 123 people watching right now let's just quickly make it to 123 likes that will help us a lot and uh youtube algorithm as i've told you guys it works in such a way that as you hit the like more people would be notified about this session so if you guys could uh just do this help click on the like button and uh you know spread your love that would help us a lot um supple shakha is saying rabin better if i would have code side by side well as of now we can't really provide you the code but once the session is done we will definitely provide you the code or maybe you can mail us at great learning academy and uh you know from great learning academy you would be able to get the code girish is asking please do a session which would include data cleansing data manipulation and data wrangling sure we'll keep that in mind and in today's session as well what we'll be doing is actually exploratory data analysis so we'll be doing a bit of data wrangling with the help of the pandas and um you know pandas matplotlib and the c-bond libraries please share the csv file sure as i've told you guys we will add the link for the data set in the description later on tushar is asking is there any available course regarding power bi yes um you can go ahead to great learning academy so there you'll definitely find a course on power bi and um we have uh uh we have a vishal in our team who does a couple of regular sessions on power bi as well so maybe once this session is done you can go ahead to our youtube channel and just search for power bi you will definitely find a video on it right so keep your questions coming in the chat but what i'll do is i'll keep your i'll be taking your questions in between and um right now let's uh go back to the code so now that we have loaded the data set i would want to have a glance at the shape of the data set so over here i have ipl dot shape and this would tell me or this actually gives me a value of 636 and 18 which basically means that there are 636 records and 18 columns and since each record over here represents a match a unique match so this 636 would mean that there are 636 matches comprising in this data frame now i want to know the frequency of most number of man of the match awards so this column which you see player of the match i'd want to know which player has won the most number of man of the match awards so what i'll do is i'll given the name of the data frame first which is ipl then inside parenthesis i will given the name of the column which is player of match after that i'll use the dot operator and use the value counts method so the value counts method helps me to get a frequency of the different categories in a categorical column and this is the result which i get so you would see that chris gale has won the most number of man of the match awards and he's won 18 man of the match awards then you have yusuf patan who has won 16 man of the match awards after that you have david wano with 15 abd videos with 15 suresh right now with 14 and so on and now from this entire data frame let's say if i would want the top 10 list right so this gives me everything so this tells me each player you know the value of each player which player has one how many man of the match awards but that is not what i want what i want is the top 10 players with most number of man of the match awards so what i'd have to do is the entire command will be the same adjust that after this i will give it a parenthesis and inside the parenthesis i'll write in 0 colon 10 and which is what i've done over here now if i click on run you would see that we have got the list of top 10 players with the most number of man of the match awards right so you have again chris gayle sitting at the top then over here you have michael hussey who's currently the uh coach of chennai super kings but he was a player with them as well now similarly instead of the top ten if i wanted only top five what i'm going to do is in this parenthesis i will get in the parenthesis i'll give zero colon five so here you have chris gayle youssef pattan david warner ebt williams and suresh tryna now i'll be doing something interesting over here i'll be making a visualization so here i have plt dot figure and uh since i would want to make a visualization i am starting off by setting the figure size and i'm giving the figure size to be equal to 8 comma 5 which are just the dimensions of the image and i'd want to make a bar plot for the top 5 layers with most man of the match awards so i'll have plt dot bar and there are two main parameters over here the first parameter would take in categorical values the second parameter would take in numerical entries so in the first parameter since i need categorical values what i'll do is the command this command will be the same but after this after taking this particular command i will give in dot keys over here so after this entire command when i given dot keys what i'll get is i'll get the names of these players and this is what i've written over here so let's say if i hit on run you would see that i have the names or the categories of the top five players with most number of man of the match award so this is the first parameter then in the second parameter i'd have to given the values so to give the values as well i'd have to remove the dot keys method and i will get the result and that is what i've given over here so again zero to five zero to five because i want this information for only the top five players then i have given a color to the parse and the color which is given as green let's just wait for the bar to be loaded up over here and you would see that we have successfully made a bar plot representing the top five players who have won the most number of man of the match awards you have chris gayle yusuf bhattan david warner ebt williams and suresh right now so this is quite interesting over here then we had the result column so most of you folks have asked me what is the result column so let's just understand that so ipl result and i'm using the value counts method on top of that and you would see that there are three categories over here we have normal type and no result normal would mean that the match has progressed normally there is no aberration that means that you have one team winning the match and one team losing the match then you have a type a tie would mean that the scores were level right this is a tie situation then you have no result which would mean that either the match was called off because of some reason so these are the three categories which are there in this column now before going ahead again i'll head to the chat and see if there are any more questions shivan jai is saying watching kohli who is doing the ads of great learning now i've also become the fan of great learning thank you very much shivam so virat khali as our is officially our brand ambassador and as virat khali believes in perfection similarly even we believe in perfection so he being the leading run scorer or one of the best players in cricket we aim to be uh you know the best you know the best learning uh tech company in the world and our aim is to provide high quality education to everyone across the world so i'm just um going through some of your comments vishal is saying uh yet another interesting session by paranisa thank you very much vishal um it's good to know that you're watching my session [Music] right and guys again um before we head back to the project if you guys can quickly go ahead and hit the like button that would be very helpful to us so we have looked at this particular column after this i want to see which team has won the most number of tosses so i would have the toss winner column let me show it to you guys so you have this toss winner column and you have one team represented over here now i'd i'll be using the value counts method on top of this which will tell me which team has won the most number of tosses so here as you see ipl toss winner dot value counts and when i hit on run you'd see that mumbai indians has won the most number of tosses so mumbai indians has won 85 tosses then you have kolkata knight riders close behind with uh if you don't have 178 tosses then you have delhi daredevils who won 72 tosses and so on now i want to understand how teams are performing batting first and batting second and i've told you that if a team wins batting first then you will have a value over here right so if a team wins padding first you will have a value over here and if a team wins batting second you will have a value over that is why if i want to analyze our teams of performing batting first then what i'd want to do is i'd want to extract all of the records from this table where the value is not equal to zero and that is the command which i'll be giving over here so as you see from the entire ipl data frame what i'm doing is in the win by runs column or from the win by runs column i am extracting only those records where the value is not equal to zero and i'll store all of those records in this new object called as batting first so now that i have created the sub data frame from my original data frame let me have a glance at this so barring first dot head will give me the top five records from this data frame and you see that we have values over here so we have extracted all of the records where the you know where we don't have zeros so this is where teams have won the match batting first now i'd want to do something interesting again over here so from this particular data frame i'd want to understand what is the range by which teams are winning by batting first right so here as you see sunrises have won the match with 35 runs here you have royal challengers winning the match with 15 runs then you have delhi daredevils winning the match with 97 runs so i want to know you know what is the normal range by which teams normally win the match so again to uh you know since this is a numerical column over here uh to make a numerical column uh to or to plot a numerical column i'll be using a histogram the people normally get confused between a bar plot and a histogram so histogram is used to understand the distribution of a numerical column and a bar plot is used to understand the distribution of a categorical column and since we have a numerical column over here we'll be making a histogram and we'll start off again by saying the figure size so we'll have plt dot figure and i'll set the figure size to be equal to 7 comma 7 which is just the dimension of the image after that after that i'll go ahead and make the histogram to make the histogram i will use plt dot test and inside it i'll just go ahead and given the column for which i'd want to make the histogram and i would want to make this histogram for the win by runs column and that is what i'm passing inside this going ahead i'd have to set the title for this so plt dot title i'll give a distribution of runs x label will be number of runs called and plt so as you see this is the distribution of how teams are winning um with respect to how many runs so this over tells you there have been maybe 100 odd matches where teams have won between 0 to 20 runs if you look at this so there would be around 20 odd matches where teams have run uh teams have won the match with maybe 75 to 85 runs then you have this so let's see you have very very few instances over here so maybe around there would be only two or three matches where a team batting first has won the match with more than 100 runs remaining similarly over here so maybe there are only two matches or maybe even there could be only a single match where the team batting first has one by over 140 runs right so in general it would seem that let's say teams batting first win between a range of 0 to 40 runs and anything greater than that that should be a huge achievement so here let's say if a team batting first wins by 100 runs then it will have a huge net run rate boost because this is something which does not happen generally again if you have a team winning by 140 runs that is just phenomenal you know so then you know it just if let's see if it has uh points tied with respect to some other team then it's for sure that it is this particular team which will have a huge net run rate so these are the different sort of uh you know insights which you can find out from plotting histograms bar plots and pie charts now i'd want to know which team has won the most number of matches batting first so we have the batting first data frame um which is sub data frame which we had created and again over there we'll have the winner column and i would want to check the value counts of it which basically means that i want to know um which team has won how many matches batting first and here it would seem that mumbai indians has won the most number of matches batting first so mumbai indians has won 47 matches batting first then you have chennai super kings who has won 46 matches batting first so here the difference is not much but if you look at this right the team at the third place so kings 11 punjab so king clement punjab has won thirty so you have king's lemon punjab um running 32 times padding first so there's a huge disparity between the number of runs um you know between the number of events for the team which is placed second and team which is placed third so again quite an interesting observation over here now what i'd want to do is i'd want to make a bar plot for the top three teams with most number of wins batting first right i'll repeat it again i want to make a bar plot the top three teams with most number of wins batting first which would be these three teams over here and again i'll go ahead and start off by setting a figure size so plt dot figure fixed size is equal to seven comma seven now once i set the figure size i'll have to make a bar plot so here i will have plt dot bar and here again two parameters the first parameter we know should have to be a categorical um should have to be categorical values and here the categorical values i'll be getting if i use the dot keys method right so i just have to give dot keys and as you see over here since i need the top three teams i've given zero to three which is parenthesis and i've given dot keys which will give me the names which are mumbai indians chennai super kings and kings 11 punjab then the second parameter would be the values to get the values i'll just remove this keys method then i'm setting the colors for these bars so i've assigned blue color for my indians yellow color for chennai super kings and orange color for king 7 punjab because those are similar for their juicy colors and as you see over here this bar represents mumbai indians this bar represents senai super kings and this bar represents kings 11 punjab so this is how we can create beautiful bar plots using uh matplotlib so let me go back to the chat let me take up some more questions abhishek is saying i'm watching your session first time today and i'm glad that i'm part of it thanks lot um i'm glad that this session is helping you a lot abhishek and if you haven't yet subscribe to our channel i request you guys to go ahead and subscribe to our channel and hit the bell icon and for also all of the new folks who are attending our session if you haven't yet subscribe to our channel or click on the bell icon go ahead and do that we'll be doing a lot of these live sessions on a regular basis so as instructors it you know um it encourages us to interact with you on a regular basis and what you can do is you can just show us your love by just hitting on that subscribe button and that is all we are asking from you guys and also you can spread the word of mouth about great learning you can tell your peers you can tell your friends about how we are conducting all of these live sessions on data science cloud computing docker artificial intelligence and a lot lot more and also if you haven't yet um subscribed uh sorry if you haven't yet liked i'd request you guys to hit the like button again that would help us a lot l rusher gaming is asking what are you teaching right now so right now we have an ipl data set with this and we are performing exploratory data analysis with python on top of this ipl data set chandra is asking will we get a certificate once the session is over um unfortunately you will not be getting the certificate for attending this session so um if we actually had a python live session series and if you had attended that series you would have got the certificate but we'll also have more such series coming ahead so um if you you know um if you attend our sessions regularly then you'll know which job attending which session will give you certificate um sanji is saying so you're not looking well um just a bit of cough and cold i'm mostly fine thank you for your concern sanjay rohit is asking how to find where the values are missing so someone has already answered it over here so super shocker is saying variable dot is null dot any that will work as well karthikeya is saying is null dot sum and that will work as well both of the answers you know they'll tell you if there are null values or not i'm just going through your comments i'm checking if i've missed out on any of your questions raghavesh saying australian site is not working properly can't open the programs um what do you mean the australian side is asking can we perform eda using numpy only um that would depend on the problem statement but in general the answer would be no because eda you'll normally be performing with pandas and a visualization library such as matplotlib or c bond so ninety percent of the times the answer will be no nisha is asking do great learning academy free online certificates has validity uh the certificate is for lifetime so once you complete the course the certificate is available with you for lifetime it will not expire great so guys i'm going back to the code part and also as it's customary for me to tell you guys that if you haven't yet subscribed please do subscribe right now and also if you haven't yet clicked on the like button please hit on the like button right now so let's see if we can reach 200 likes uh before the end of the session or maybe in the next five minutes so if we can reach 200 likes in the next five minutes that will be wonderful that will uh you know make my day so if you can just all of you can quickly hit on the like button that'll be amazing right so now we have um made this bar plot and after this i'd want to understand the distribution of you know the teams how they are winning after batting first or in other words which team has the highest amount of um wearing percentage after batting first and for this i'd have to make a pie chart i have plt dot figure i will set the figure size to be equal to 7 comma 7 then i have plt dot pi i have so here there are two parameters this time when we are making a pie chart first we will give in a numerical entity or all of the numerical values then the second parameter is where we'll be giving the categorical um values so first i have batting first winner dot value counts so batting first winner dot value counts so this is what i'll get so i'll get these values over here then i'd want to know the categories so to get the categories i'd have to use the keys method over show let me hit on run and you would see that i have got this beautiful pie chart and as you see the biggest portion or the biggest pie is for mumbai indians now if i would want to add a percentage value over here that is also something which i can do so i have this parameter called as auto pct and here i'll set auto pct as equal to let me just write 0 percent 0.1 f percent percent now let me hit on run let me remove this underscore from over here and as you see with the help of this auto pct parameter i am able to add percentages on top of this pie chart so the 10 percentage for mumbai indians batting first is 16.4 percent when you have chennai kings with 16 then you have kings 11 punjab with 11.1 percent and so on so that is how it proceeds let me increase the size of this again this is a bit too much let's just keep it at 125. now as we have analyzed for how teams are performing batting first i would want to do a similar analysis for how teams are doing batting second so what i'll do over here is so if i would want to extract all of the teams which have one batting second then i would need a value over here or in other words it cannot be a zero in this particular column now let me scroll down again and here as you see from the entire ipl data frame i am extracting only those records where when buy wickets is not equal to zero so when when by wicket says not equal to zero that would mean the team has one batting first so the team has one batting second and that will go ahead and store in this object called as batting second then going ahead i'll have a glance at the first five records of the data frame over show and here as you see let me just drag it to the right so when when buy wickets column you would see that i have only values over here and none of the entry is equal to zero now as we had analyzed um this particular column by building a histogram similarly i'd want to understand the distribution by which teams are winning by batting second i'd want to know what is the distribution in wickets so we'll start off again with the same thing i will set the figure size to be equal to seven comma seven then i'm making a histogram and since i'm making a histogram i can directly pass in the column over here so from the batting second data frame i am selecting the win by wickets column and i will use the number of bins to be equal to 30 and i'll have plt dot show over show so as you see so this over here on the horizontal axis this represents the uh number of wickets by which or the team has one so if you look at this this value is 70. so there have been 70 matches where teams batting second have won the match with seven wickets remaining right so this tells you a lot again so this is maybe around 67 or 68 matches there are 67 or 68 matches where teams batting second have won the match with six wickets in hand if you look at these two over here there are very very few instances so this may be there would be around only three or four matches were teams batting second half one with only one wicket in hand over here again maybe you would have three or four matches where teams batting second have one with two wickets in hand if you look at this there'd be maybe nine or ten matches where teams batting second have one with ten wickets in hand which would mean that not even a single wicket was lost so there were around nine or ten matches where the team batting second has not even lost a single wicket and has chased down all of the runs and has chased down its target now similarly as we had created the pie chart for this what we are going to do is we'll start off by looking um you know which team has won the most number of matches batting second so you have patting second winner dot value counts over here and you would see that kolkata knight riders has won the most number of matches batting second so it has 146 matches then you have mumbai indians which has 144 matches batting second then you have royal challenges bangalore which has 142 matches batting second and this is how it proceeds now i would want to make a bar plot for the top three teams which have won most number of matches batting second and since i'm making a figure again i would have to set a figure size so plt dot figure fixed size is seven comma seven and i'm making a bar plot so it'll be plt dot bar and over here since the first parameter has to be categorical values i'll have patting second winner dot value count since i would want only the top three teams i will give 0 comma 3 over here then i will use the dot keys method which will give me the names so names would be kolkata knight riders mumbai indians and royal challenges bangalore second parameter is this where i'll be getting only the values the third parameter is where i'm assigning the colors to these bars so for kolkata nitriders i'm assigning a color of purple for royal challengers um sorry next we have mumbai indians for them i'm assigning the blue color then we have royal challenges bangalore and for them we are assigning the red color let me quickly hit on run over here and you would see that these are the top three teams winning the most number of matches batting second so interesting observation over sure again now let's go ahead and um quickly make a pie chart as well so plt dot pi first parameter would be numerical entries dot value counts and the second parameter would be dot keys and i already set the auto pct over here and as you see kolkata nitridus with the maximum wind percentage batting second which is 13.6 percent then you have mumbai indians with 13 then you have rcb with 12.4 percent so let me quickly head on to chat and um see if there are any more questions um rohit is asking what is ben's is equal to 30 okay let me just show that to you folks here let's see if i set the number of bins to be equal to 5 this is what you have so as you see you have one two three four and five let me actually reduce the size of this a bit um let me make it proper one ten percent should be fine over here again let's see if i take the number of pins to be eight you will have eight pins over here right so one two three four five six seven eight similarly if i set the number of bins to be equal to ten then you'll have 10 bins 1 2 3 4 5 6 7 8 9 and 10. i've given a value of 30 because i would want to clearly know the exact value right i've given a value of 30 because i would want one single bin representing one single wicket right so that is why i've given a huge value over here so if you want all of these side by side or you know alongside each other you can reduce the value so let's say if you want only three bins instead of 30 bins you can set the value to be equal to three and here you would see that you have only three bins so i hope it answers your question and for um is asking is the data set up till 2019 um so the data set is still 2018 i believe i'm not sure it's still 2018 or 19 we'll actually get uh we'll actually know that there's a column which would tell the season so this would let us know rohit is asking what is the difference between categorical and continuous variable that's a good question so let me head back so when it comes to a categorical column so sure as you see you have team one team two you have different categories over here when i say categories you will have uh so it's a so you know the name of the team is a category isn't it sun rises hyderabad is category one mumbai indians is category two which are our clients as category three and so on now similarly if you have a city so hyderabad is the first category pune is the second category raj's quote is the third category so these all of these columns are categorical columns now if you look at this column so when buy runs this is a numerical continuous numerical column you don't have categories so let's say if you have a number stats 27 that is not a category because all of these numbers are fluid aren't they you know so from one you can maybe have a lot of numbers over here so you don't have fixed categories so this is the difference between a continuous numerical column and a categorical column um some of you folks are saying that you guys have registered but you're not able to get the pdf i'm only using the code file i'm not using a pdf over here so that is why i don't think she'll have access to any of the pdf let me head back and see if there are any more questions hydrocarbon is saying you guys are amazing and i've shared your channel with five of my friends to subscribe thank you very much hydrocard that means a lot okay some of you folks are seeing that you're getting the file that's good okay so guys quickly we have 106 folks watching um we've got around 10 minutes left if you haven't yet subscribed quickly hit on the subscribe button if you haven't yet liked quickly hit on the like button as well right so quickly go ahead and do that now since we have done this let's see what else are we left with all right so we have the season column in the ipl data frame and i'd want to know how many matches were played in each season so i have ipl season dot value counts i click on run and this is what we get so this data frame comprises of seasons from uh the starts from 2008 and goes until 2017. right so 2017 is the final season um and we have data till 2017 over here so um and over here this would tell you that 76 matches so there were 76 matches played in the 2013 season there were 74 matches played in the 2012 season and in the 2009 season you had the lowest number of matches in the 2009 season you had only 57 matches being played so this is some interesting information for you folks then you have ipl city dot value count this would tell me in which particular city most number of matches were played so it would seem that most number of matches were played in mumbai so 85 matches were played in mumbai then you have bangalore so 66 matches were played in bangalore then you have kolkata 61 matches were played in kolkata so mumbai obviously we have the wanker day stadium so banquet is obviously a great stadium and that is why most of the matches are played over there now i want to know if there's a relationship between a team batting first or a team winning the toss and the team also winning the match let me quickly scroll back up so sure you would have two columns so let's see we have this toss winner column and we have this winner column i'd want to know if the value in this toss winner column is equal to the value in this winner column which would mean that the team which has won the toss has also won the match and that is what i am trying to analyze over overshoot so ipl tosswinner is equal to ipl winner i'd want to see in how many of these instances this will be a true and wherever it is true i would want to sum that up so np dot sum will sum up all of the trues which i'll get from this particular condition and it would seem that there have been 325 matches so out of these 636 matches there were 325 matches where the team winning the toss has also won the match now let me see what is the percentage of this so this would come to around 51 percent so 51 of the times in ipl the team which has won the toss has also won the match this is an interesting observation so this would tell you that there is no clear-cut relationship between a team winning the toss and also winning the match let's say on the other hand if instead of being 50 percent if it were 70 percent then you can clearly say that you know it's you know a team winning the toss has more of an advantage to win the match but here it would seem that there is no such advantage or not a clear advantage which you can decipher uh from a team winning the toss and going ahead to win the match as well right so we have around five minutes left uh we have a bit of things to do uh what we'll do is um we'll also cover that up so here we have another data frame called as deliveries.csv so we'll have pd dot read underscore csv we'll given the name of the file which is deliveries.csv and go ahead and store it in this object called as teleplease then going ahead i'll have a glance at the top five records of this data frame so it'll be deliveries dot head and as you guys see over here this data frame gives me ball by ball information of each match so in the earlier data frame we had this id column so right this match id 1 represents that particular match between sunrises hyderabad and royal challenges bangalore and as you guys see over here we have ball by ball information so first over ball one first tour ball two first over ball three and so on and for the first five balls you would see that it was david warner on the strike and shikhar dhawan on non striker's end and the bowler was t.s mills now i'd want to extract all of the unique match ids so deliveries match id dot unique i'll click on run and as you see we already know that there are 636 unique matches so if you look at this particular result over here you have the id starting from 1 going on till 636. now out of all of these ids i would want to analyze only this particular match where the match id is equal to one or in other words i want to analyze this particular match between sunrises hyderabad and royal challengers bangalore so match id is equal to one match one dot head and this is what i have now i'll have match one dot shape this will tell me that there are 248 records and 21 columns now what does this 248 records mean so in a t20 match normally you have 20 overs per side so when we say 20 hours that would be 120 balls per side isn't it so 120 plus 120 that will give you 240 balls so in a normal t20 match there would be 240 legitimate deliveries but you have 248 records over here so what does that mean so this basically means that there were eight extras in the match so these eight extras could be white snowballs buys or like buys right so this eight you know you had 240 legitimate deliveries the eight extra balls were either whites or no balls so that again is an interesting information and now from this entire match i'd want to analyze only the first innings which was of sunrises hyderabad so i have match 1 ending is double equal to 1 and that i will store in this object called as srh then i'd want to see how you know how these batchmen have scored run so srh batsman runs dot value counts let me just show you the column over here so we have this column batsman runs so here as you see in the third ball of the first over david waller has hit a four so this is basically what we have over here right so batman runs either you have a single double or so on and now i will scroll down over here and as you see this would tell us that in the entire uh first innings or in paintings of sunrises hyderabad there were 57 singles 32 dot balls 17 fours nine sixes nine couples and there was one triple taken and similarly if i uh want to see how were the attachment of srh taken out i have the dismissal kind column so srh dismissal kind dot value counts you would see that only four wickets of srh have fallen down three of them have been caught and one was bowled and if i want to do a similar analysis for royal challenges bangalore as well so what i'll do is match one innings needs to be equal to 2 because rcb has batted second over here and that i will store in this object called as rcb then batchman runs dot value counts i'll hit on run so for the second innings for the innings of royal challenges bangalore you would see that there are a lot of dot balls and might be this might be the reason why the team lost in this particular match right so when it came to srh there were only 32 dot balls but when it came to rcb's innings there were 49 dot balls which is a lot right so rcb had how many 17 more dot balls when compared to srh innings there were 44 ones 15 boundaries eight sixes and there were seven twos now similarly if i would want to see how the uh rcb patch men were dismissed so i have rcb dismissal kind dot value counts and here you would see that you have all of the wickets being fallen down so or rcb were all out so out of the 10 wickets six of the batsmen were caught two were bowed and there were two runouts as well and that is how of rcb had collapsed in this particular match so guys we are successfully done with this session and we have analyzed a lot of things in this particular uh in this particular session um we've performed a comprehensive expiratory data analysis where we've used matplotlib where we used pandas as well and um let me see if there are there is anything in the chat i should take up before ending the session um so most of you folks are requesting for the data set so what we'll do is we'll just add in the description so um someone um from our team will add in the description um in a time and guys thank you very much hope this session was fruitful to you folks and before ending the session a quick reminder to hit the subscribe button and also hit the bell icon so guys uh i have to remind you folks to hit the subscribe button a lot because again we are not charging any money from you guys to take all of these sessions all of these are absolutely free and all we want is to provide high quality education to everyone across the world and if you can just hit the subscribe button uh that will grow the channel and also uh you can spread the word of mouth about great learning so that your friends your peers your colleagues can know more about great learning and what we are doing over here and also if you can before signing off if you can quickly hit the like button that will also be wonderful so if you hit the like button what that does is it will show up in the notification feed of people who are interested in this kind of stuff who are interested in data science or data analytics and are interested in cricut and how to use data science in the world of cricket right so if you could just quickly hit the like button and hit the subscribe button that would be wonderful so thank you very much and i'm signing off for now have a great weekend folks
Info
Channel: Great Learning
Views: 15,587
Rating: undefined out of 5
Keywords: Great Learning, Python for data science, Data Science with Python, Learn python for data science, Python for data science tutorial, Python for Beginners, Python Tutorial, Pandas Python Tutorial, Matplotlib tutorial, Linear regression, Logistic regression, python tutorial, data science tutorial, Great Learning Live Session, great lakes, great learning academy, python data science, ipl data analysis python, ipl, ipl live, ipl 2020
Id: bADxQyFpjlI
Channel Id: undefined
Length: 62min 50sec (3770 seconds)
Published: Sat Sep 26 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.