IPL Data Analysis using EDA | What is Exploratory Data Analysis | EDA Case Study | Great Learning

Video Statistics and Information

Captions Word Cloud
Reddit Comments
all right hey guys welcome to this session by great learning my name is anirudh rao and i'll be your instructor for this session on ipl data analysis using eda now eda stands for exploratory data analysis and it is one of the most fascinating domains that has taken over the recent world today but then guys before we can go ahead and begin with this session uh i would just want to make sure that everything from our side is perfect for all of you all to view optimally so guys please head to the chat box and let me know if you can uh see my screen fine if you can hear me find right just quickly head to the chat box and give me a confirmation uh so that you know we can go ahead and begin this wonderful session right perfect guys first of all welcome to you all from wherever you guys are tuning in from it's gonna be a beautiful session it's a beautiful evening here in bengaluru uh you know we're gonna get started with exploratory data analysis it's a very very important thing today it is highly tight highly trending uh in the year 2021 uh you know if you guys realize now while you guys go on to give me a confirmation i just want to tell you that i'm monitoring all of your comments uh using my mobile phone here right so guys as always uh you know let's keep it highly energetic let's keep it highly interactive no matter what question you have however basic it is guys please head to the comment section uh head to the chat box and please put them down uh you know without further ado uh you know we can get started with it right but then guys before we go on to check the agenda of today's session let me take a minute of your time to guide you about this fantastic venture we have here at great learning called as great learning academy now great learning academy is definitely a place for all of you all to check out before me telling you what great learning academy is i can promise you with this either you are regardless of where you are in your career right now be it that you're a student you're in college you're just about to graduate or you just graduated you're looking for a job you're hunting for your first job you're looking to switch jobs you're looking to change carriers whatever it is right so right from uh you know the point of when you're in school all the way to your switching your entire carriers i can promise you something that here at great learning academy you can have access to fantastic courses absolutely free of cost that will definitely help you increase your knowledge base right so we have catered to somewhere around 1 million plus learners from 140 different countries we provide somewhere around thousand plus hours of free learning content and we get three million plus uh views on all of our learning content that we have out here right now this uh with the great learning career path there are so many advantages here guys you can definitely pick up a career path and you know you can work towards becoming an expert into those as well now in this particular uh you know example we're talking about a data analyst's role right exploratory data analyst so let me just find the data analyst uh path and let me just click on it now if you can see there's somewhere around 95 000 93.5 right so 90 500 jobs available in india right now for you guys to become a data analyst the average salary is somewhere around 5 lakhs per annum and of course we already have 1.7 plus lac learners uh who are tuned in to all of these programs guys so you can see right so we have 8 000 plus career transition and the people who have picked up our job ready programs here right so they have an average salary of 48 percent and of course we have 1 900 plus people who are enrolled into this post graduation program this month as well but then guys more details about this at the end of the session so make sure you guys are sticking till the end because that is where uh i'll take the opportunity to ensure that you know i can guide all of you all from being uh you know from from rather being uh beginners right now all the way to how you guys can work towards uh expertise right perfect but then to work on the data analyst part you definitely can take a look at some of the free courses that we are providing just so you can get started uh with the domain itself right for example data science foundation is a fantastic free course that we are offering let me just click on it and now here you can see that this this course is taken up by somewhere around 27 000 learners you can find out all the details about the course skills covered all the syllabus of it and in fact even even a sample course certificate which looks like this case right so i'm sure all of y'all tuning in to this particular video you know the amount of value that a certificate adds right a certificate of completion adds to your resume to your linkedin profile or whatever it is because these are the ones that will help you uh you know these are the assets that will help you stand out compared to the competition is definitely something that i'd want to say here so guys you let me know head to the comment section we have 122 people joining in right now so guys welcome to uh welcome to the session by great learning from wherever you guys are tuning in from i'm so glad that all of you all are tuned in and are as excited as me for data analysis guys but guys i want you to head to the comment section right now and tell me if you all have signed up at great learning academy or not can you just quickly head to the chat box and give me a quick yes or no with respect to you uh you know having uh created an account in great learning academy and have you guys started on it right so just quickly head to the chat box and let me know and while you guys do that let's quickly go on to take a look at the agenda for today's session this session is going to be uh you know really really fun at the end of it i hope that you know i want to make sure that all of you all know what exploratory data analysis is itself right so to get to that we're going to start out by doing some theory let's understand what exploratory data analysis is let us take a couple of examples let us try to see if we have ever done any sort of data analysis in our life even though we might have not done it with with respect to the domain of data analysis let's check that out after that we can take a look at the various types of exploratory data analysis that are out there right now there is multiple types of it all of them are equally important and even though you may or may not use all these methods it's extremely important for you to know that right so we're going to check that out after this we're going to take a quick look at some of the advantages of eda eda is the same as exploratory data analysis now you can uh you know you can take a look at exploratory data analysis and you can understand that it's it's not only really popular just because of how powerful it is but once you start understanding the advantages of eda as well right that is when you guys will get the complete understanding get the complete picture of okay this is why eda is so popular right so that uh that moment of euphoria i definitely think that all of you all will feel as well after we discuss the advantages and then up until this point of time we would have discussed a good amount of theory so i think it becomes extremely important that we actually go on to perform the data analysis on the ipl data set right so guys ipl data set uh you know ipl is in session right now i guess right indian premier league i'm sorry i'm not much of a cricket guy but then i always have a ton of fun uh when we go on to do any sort of analysis so guys uh quickly head to the chat box and can anyone tell me who is at the top of the ipl table right now so who who you know which which team is it rcb is it mumbai indians like who are these guys who sitting on that first top two or the top three positions right head to the comment section and let me know why you guys do that i want to give you a quick introduction to exploratory data analysis right guys exploratory data analysis is fantastic we've been using this concept all of our lives but just that we do not we did not know that hey we might be using eda now let me give you a quick example uh to uh you know give you more clarity about water tests first of all let us just say that there is a there is a new tv show that has come out right maybe uh you know think about back when it was longer i think it was 2011 or 12 when game of thrones came back right came out for the first time of course that is not when it was popular anyway let's just say that there is a brand new tv show that came out right now all of your friends are really going gaga about it twitter instagram facebook everywhere it's flooded with either storyline spoilers or something like that of the story right now what do you do whenever you see someone else telling you hey bro hey matcha whatever it is right however they address you uh you know you should really watch this i'm like okay so what do i do it's i i just really won't go directly to netflix and start watching just because someone else told me to watch it right even though it's very popular i am curious about it and i want to sit and understand hey can i find out some things about uh that particular tv show so what i do i just had to be i just had to google i just type in the name i just see the average imdb review rotten tomatoes review so i'd see a couple of reviews maybe i go to youtube watch a trailer or something like that and only then i start watching the tv show if i think it's something i like for the mood then right that is me actually exploring and understanding more about it even though i did not have the full detail i just had the name of the tv show but then i took all of that and i had multiple ways maybe i called up a friend to ask about the tv show right maybe i watched a couple of videos or i realized that hey okay so this is from a novel that i read in like what the 1990s or something and now they've made it into a tv show there are so many cases for that right so coming to the actual definition of uh you know okay we just have someone spamming the comment section let me just check out what's what's going on uh so can you adjust your microphone because your voice is only coming in from the left side can you please shift in the center uh well arcstar i'm i'm sure that you know my mic has been configured properly we just ensured that you know it was fine as well so guys can you please uh guys can you all head to the comment section and tell me if i am audible fine uh is there any issue on our side or is it just arcstar that's uh you know finding out the issue guys can you just head to the comment section and let me know if my mic is again fine right i'm sorry that i have to uh ask you again but i just want to make sure that all of you all uh can hear me comfortably as well right perfect so now while you guys go on to give me that confirmation i want to tell you something about exploratory data analysis right so basically it's you doing a lot more research than what's offered to you now what are you trying to do with respect to the tv showcase right you're trying to find out more things about it you're trying to find out the trailer you're trying to find out the reviews about it you're doing some sort of investigation on your own to see what works better is it something you'd want to watch or something right so whenever you usually google the name of a tv show google itself is intelligent to even suggest something else if you're not into watching that so you will understand the anomalies you'll understand patterns in the data you will just have a ton of questions uh you know if you're given up just just the name of a tv show right that is expiratory data analysis you're trying to explore something into a topic and you're performing analysis by exploring into the topic and that is what gives it the name eda right okay perfect guys uh all right a couple of people uh you guys are telling me that the audio is fine and some of the people say you're hearing only the left hand side and let me just check my microphone again okay guys i think the microphone should be fine now right let me just move it a little more yup should be fine now okay perfect so what are we trying to do with eda in terms of why it is so important right see uh exploratory data analysis is very very important because first of all we can get to understand our data even before us coming to conclusions right so i can take a look at the tv show and i can say i don't watch thrillers i don't watch horror tv shows right but then it's just me just coming to a random conclusion but maybe the tv show even though it has the thumbnail of maybe it being a horror it can be very much a comedy tv show as well right so i want certain concrete proof before i am coming to a conclusion and eda helps with that the second thing is it will help us give us give give us more clarity before us making any sort of assumptions right now there's a new tv show that came out my friend gave me a call says bro you have to watch this now what i'm trying to do here is i'm trying to analyze i'm trying to see before i make any assumptions of either it's good or bad or it's just trending and it has no value or anything like that i just want to make sure that i have more clarity towards it so i told you the first thing i had to head to is google right see that's perfect the next thing is it's help spots erroneous data trends and events sometimes in let's just get back to complete theory in terms of eda you know when it so happens that you're working with a good amount of data okay it can be any data you're working with it and suddenly you realize hey okay you know my data is wrong like you definitely will know things of what's going on right now right so maybe you can just go up to your terrace right now and you can look at a clear sky and you can say hey there's a good chance it'll not rain for the next one hour but maybe as soon as you say hey google will it rain in the next one hour google will say absolutely you know it's a thunderstorm it's a wipeout but that's not the case right there's a tiny error tiny mistake there so whenever you have any sort of questions you answer that questions and you provide proof ample number of proof with respect to data and say hey here is why i can answer it so confidently and i'm not just guessing and i'm not just running into conclusion that is when you can safely say that you have used a product which is again based off of data analysis right perfect guys now i hope all of you all had a quick understanding of what exploratory data analysis is right see the answer to be very honest with you is the name itself you're trying to analyze into data think of data like a treasure box a box full of treasure in those pirate movies uh you know and those uh cartoons we used to watch as children right so as soon as you dig into the treasure you find the treasure you open up it's a big wooden box and you start cracking it open you find gold inside right similarly with eda you just have to go through a lot of mess a lot of process very complex procedures to ensure that at the end of it you can get towards that goal right it's literally like you're hunting into the data where data is the treasure box you open it you find a lot of gold but you need to know how to search where to search and all of these things right so that is exploratory data analysis all right perfect guys now we'll quickly take a look at the types of exploratory data analysis that are out there now i want all of you all to note down i want all of you all to remember because ladies and gentlemen this is something very important and if you are ever applying for data science or data analysis interviews right now this is a question that's usually 100 asked right so there's three types of exploratory data analysis pay attention here we have univariate analysis we have bivariate analysis and we have multivariate analysis look at the name again uni by and multi right uni usually talks about one by usually talks about two multi usually talks about more than two right but what are we talking about here in univariate analysis we are trying to only analyze one variable right that's why the name univariate analysis we are only analyzing one variable with bivariate analysis what are we doing now you can guess right so we are analyzing two variables instead of one hence the name bivariate now what happens if you have to analyze more than two variables then you have to give it one name which you know it works from two all the way to the end right that is where we have multivariate analysis multivariate analysis basically is us performing analysis uh considering more than two variables at a time right so this is multivariate analysis now you guys can be a little curious and of course you guys can be a tiny bit uh you know confused here as well so just to add more clarity to it let me give you a little bit of examples here so that you can remember it a lot better when you guys are working on this right perfect now if ever we're talking about univariate analysis we are analyzing one variable right perfect now if you have to track your own change in your age or in change in your height what would you do you just have to analyze one variable right if i say hey over the last 10 years how has your age changed seems like a pretty straightforward question your answer would be 10 right in the last 10 years you basically you basically had 10 years to grow so your age gets added by 10 correct perfect next take a look at the height if you want to track your height changes maybe from 2018 all the way to 2025 and you're marking it on your wall i mean that's the jugaad way of how a lot of people i know do but yeah if you're doing any of these kinds of things but if you're trying to analyze your height you just need your height data nothing else right of course you can add h to it you can add male female you can add whatever you wanted but if i just want to analyze height can i do it with one variable yes or no the answer to that is yes and it's through univariate analysis right perfect now if you're talking about bivariate analysis with bivariate analysis basically you are trying to analyze either the outcome or a couple of variables using two sets of data two features as we call it right now i'll give you a very good example if you're thinking about sport preferences for uh the male and the female population it is very different right now we can see one thing very clearly here is that since we are talking about ipl itself the male population in cricket uh i know people who prefer guys who prefer playing cricket is very much higher than maybe girls who prefer cricket as well right i know india has a fantastic uh women's cricket team i am aware of the fact but just for this kind of a comparison if you just take a look at the preferences right now i know a lot of women who love playing hockey right so they just absolutely love hockey so if i'm taking a look at hockey cricket or maybe something like formula one or moto gp or anything all these are usually male dominated sports right so there's a preference people uh from that same sex definitely want to be doing more and more of that but then to analyze that what do we need we require two things one the general male population two the general female population so we can analyze and assess you see what we're doing we're using two variables like that right similarly for multivariate analysis as well in multivariate analysis i can give you a fantastic example think about diabetes prediction right diabetes you guys tell me it is not just the onset of age right it usually happens with a lot of different factors uh first of all you age second of all how well are you built how well do you work out what's your insulin level what are your eating habits uh you know what's your body mass index there's a million things that will uh that you know that can have a say in if a person has diabetes or not correct guys it's not just one thing that is where when you have to analyze age insulin body mass index weight fat percentage your eating habit everything you have to analyze more than two things to come to the final conclusion to say someone is diabetic or not or you can predict saying this person might have diabetes or not right you just saw what i did there i took more than three four variables to compare that is multivariate analysis so guys i hope with this we are clear with understanding univariate analysis bivariate analysis and of course multivariate analysis as well right perfect now we're talking about a good amount of theory i just want to take one second to introduce you to this very fantastic kind of plotting that we have which is very very popular in the case of eda it's called as a box plot have you guys seen a graph which looks like this you you literally have a graph on your own i mean apart from all the numberings and the letters on your screen have you guys ever had a chance to come up uh you know head to head against this and finding out uh you know what what a box plot is or how it works as well right let me tell you since you're talking about exploratory data analysis i'll give you a little bit of theory and i'll quickly explain how this works right so you have certain values in a distribution you have a couple of values that are spread out and now you want to put it in a box plot the way you put things in a box plot is the most minimum value goes on the left side and you can see here it says min this is the minimum value in this case the minimum value is either 0 or 1 or something like that and then you have the maximum value in this case the maximum value is where the maximum value is somewhere around 30. but let's discuss everything between the minimum and the maximum so the median value you know always talks about the median right the median value is basically the middle element in an ordered data set so here the median seems some somewhere like 12 the middle line in the box that always tells you the median and then you have something called q1 and q3 q1 and q3 basically talks about quartiles when we have data and we want to split it into percentiles we actually split it into four quarters first quarter second quarter third quarter and fourth quarter in this particular case the first quarter is between min and q1 the second quarter is between q1 and median the third quarter is between what it's between median and q3 and again similarly the last quarter is between q3 and max right so you can find out if i draw here probably you can look at four circles right it just looks like the symbol of an audi or something like that right perfect and then you'll see a couple of dots here three dots that says outliers outliers are basically data which is not in the range of the data set that we're talking about i'll give you an example if we are talking about the maximum speed of maybe a bugatti or a ferrari right so it's let's let's just say on paper it can go up to 350 or 400 kilometers per hour that's the maximum right not all cars can do it but there are cars that can do it now an outlier in this particular range will be a car that can do thousand two hundred kilometers right now is there a car which can do thousand two hundred and my knowledge uh absolutely no right right now production car which you can go into a showroom and buy so if you have that data maybe when you are analyzing all the supercar data when you guys want to pick up your first ferrari or lamborghini or something like that you'll see that there's a car with a top speed of thousand kilometers per hour or something one thought is like okay wow that's a very fast car the second train of thought will immediately come to your mind is saying okay no car can go that fast there is an error in the data to find out that error in a data that is why you require outliers and outliers basically it doesn't have to be that far away it can be 500 600 kilometers per hour but at the end of the day it is just telling you hey all these data points are not in the same range as you know all the other maximum points of your data so let's just think of it as an outlier as of now and let's see if that is important not important throw away or whatever it is right perfect now guys let's quickly discuss about the advantages of exploratory data analysis before we can go ahead and uh you know physically take a look at all the examples that we are talking about right perfect now since we're talking about the advantages of ada there are many many advantages guys we can go on to discuss these advantages for hours and hours together but just to make sure that all of you all are on the same same page as me on as on this one i have just kept it to five really really important points uh that that that's the advantage of eda right first of all it has to be the biggest advantage arriving at data driven conclusion and insights now what i mean by this is as i told you right if there is a problem at hand i can either analyze it and give you an answer or i can guess now if you ask me uh how a tv show is right maybe there is a new tv show that came out and you asked me hey what's how do you think about this tv show if i haven't watched it if i've never watched it or i don't know anything about it there's two ways i can think about it the first thing is hey let me do some research and let me talk to you about it is one answer the second answer is uh may it just seems so fancy i don't like it as you see what i did there the second answer that i just gave is a random conclusion without having any data it's just me thinking that it's a boring show let's not watch this this happens with a lot of us this happens in fact with all of us i'm pretty sure in one point or the other we just try to have a conclusion without data behind it right that's one thing with the help of exploratory data analysis this will not be the case every time you come to a solid conclusion you will have data proof to make sure that you know you know how you got to that conclusion this is a huge huge advantage huge huge point here right the second thing is as i told you uh you can have a ferrari or a bugatti but i'm sure they don't go thousand five hundred kilometers per hour on the road right so you'll have to check for data analyse you have to make sure your data is consistent it is clear it is not confusing and uh there are so many things which can get messed up right so if someone asks hey how how fast can your car go from 0 to 100 kilometers per hour there's usually a time right five seconds six seconds something like that and if you just find the fastest car out there and if it says 120 seconds for you to get from zero to 100 kilometers per hour you might laugh but the thing is yes there can be certain error in that data that is leading to that right this is with with the case of data analysis since you're analyzing into cars you can catch it out really quick right that's an advantage the next thing is feature engineering okay let's get back to that diabetes example that i was just explaining to you right uh when we think about diabetes as i told you there are multiple features that we have to go on to analyze to see if a person has diabetes or not but how do i know what these features are that's what feature engineering talks about me finding out me knowing that hey age has an effect on diabetes me knowing that hey body mass index has a effect on diabetes uh you know blood sugar has uh you know an impact on diabetes uh what else uh body fat level uh you know think about your eating habit think about your workout routine all of these things can have an effect on it right so me finding out these features me thinking about these features that is feature engineering all right perfect the next thing we're going to take a look at is data cleaning now data cleaning is very much required because if if it comes to a point where you're using data analysis with machine learning usually that's the case machine learning and data analysis go together and if you're using that and if you send dirty data when i say dirty data it's basically data which is not pre-processed right with all these anomalies and inconsistencies if you send it to your machine learning algorithm your machine learning algorithm will not be able to figure out saying hey that is rights is wrong and all of that it will just start learning and assuming that every data that you've provided is correct and again at the end of the result you'll be like okay so machine learning is not good why because you just gave it incorrect data and it gave you an inaccurate output not not the fault of the machine learning algorithm right yes you can avoid all of that with eda now the next thing the last point here it says it helps in handling under fitting and overfitting under fitting and overfitting are the two most insane things that you know a data analysis analyst or a data scientist has to deal with guys i'll be very honest with you under fitting is a situation where your model cannot uh you know it it literally cannot find a pattern of what's happening in your data and it just cannot learn enough to understand what's going on right a machine learning model in the case of overfitting it's the exact opposite your machine learning algorithm is looking so much into the depth of your data it is just trying to find anything that it can find and it is digging deeper and deeper it just finds noise it just finds useless information rather than something that would work out for you just because of the fact that it is doing it over and over and over again in a way where nothing is helping right that is overfitting under fitting and overfitting can literally kill your machine learning model to a point where uh you know the outcome of it or the efficiency of it will just drop almost near to zero if you haven't handled overfitting and under fitting and i'm sure uh you know you guys if you've ever worked on data sets you guys will understand this right perfect guys now i think all of you all we have we have somewhere around 152 people uh joining us for this session guys now is the time we go on to python now is the time we go on to check out practically of how uh you know we go on to perform data analysis on the ipl data set right so what are we trying to do with the ipl data set that we have first of all we need to answer some questions with respect to the data set because now i you will be curious saying okay so since we're performing data analysis with ipl what's the data right that again we can have python itself answer the question for us for example what is the size of the data set what are the types of data that is present in the data set is this data set pre-processed are everything that's present in this data set correct or all the labels present right now there's so many things that you can make mix up in cricket data maybe uh something like how many wickets were taken versus how many runs uh the team code right you can just jumble these two things and you will know that it is wrong you cannot take a 120 wickets in a you know you know ipl match right but you might know that hey can 120 be the runs and maybe eight or nine that can be the wickets right so you know it but your machine learning algorithm does not so to make sure that is correct uh we are gonna take a look at that as well so we're gonna ask our data set questions we're gonna we're gonna keep firing questions at python and python is gonna give us all the answers that we are looking for right perfect now do we just ask questions about the data sets or do we have any fun questions to ask yes we have a ton of fun questions to ask but before we ask all the fun questions we need to understand the data set in terms of its statistics we need to understand the value of mean median and mode you remember when i told you after we looked at a box plot saying the data is broken into four quarters or the quartile division or the interquartile division we're gonna have to take a look at that and of course we can take a look at how many different data elements that we have that is going to help us perform this analysis right now coming to the fun questions the fun questions we can ask saying is okay how many ipl matches were ever played in all the data that we have how many seasons of ipl are we analyzing right now which team has scored the most runs which match gave us the most amount of wickets most amount of runs or who is the player who has maximum amount of those purple caps or whatever it's called as right perfect we're gonna take a look at this we're gonna do a ton of interesting questions and i hope you guys are really as excited as i am uh because i am just heading into my uh jupiter notebook so basically we are using google collab guys google collab is nothing but a simple python jupyter notebook that is hosted on the google cloud platform you can actually uh you know in fact you can get started with it on your own as well just go to collab.research.google.com it's free to use it's very simple uh you just have to type in your python code hit enter and it runs you can do it from your phone you can do it from your playstation you can write python code execute it and ensure it works literally from anywhere that you want because whatever is running here in the back end or the background it is basically running on a server in the google cloud platform again the server can be anywhere around the world so you can basically sit at your favorite starbucks or while you're eating your morning idli dosa so whatever it is and you can start writing code and it will work absolutely fine right all right guys so let's get started with expiratory data analysis again ipl is in full swing and enough we can go on to check it out right so first step as always whenever you're working with any sort of data what is it that we do we always import all the libraries that you're working with right pandas and numpy are two really really important libraries which we have to have pandas is basically providing us with two data structures one is called series the other one is called data frame and of course we have numpy numpy is the number one library that you have to go to whenever you can think about any sort of numerical computation working with matrices working with arrays and all of that right perfect and the next two libraries that you see matplotlib and cbon these libraries are basically the ones which help us with what data visualization right when we have to create beautiful looking graphs and we're going to create a ton of graphs as you can see here you guys will understand now it's been a while since we have seen an error in python let me run this and i'll see a beautiful error right now see there's the error i told you about yes now why do i have this error first of all i have the data set file in my computer but not on the google cloud platform i have to bring my data data for data set file to the google cloud platform right so let me just quickly uh find all the data sets that i have and let me bring in the data you're going to see what the data is you will understand i just have to make sure that right now you know it updates and it's ready right now i've done it let me run it again now is the fun part now is when you will get to see what the data looks like let me zoom out a little right perfect guys now follow along with me because this is where things get really interesting first of all every match that has ever been played has an identification id next is the season which year of ipl was the match play done right ipl 2017 2016 we're going to see how much ipl data we have as well so yeah and then we're going to see what what city the match was played in that right like which city was the actual match played in what is the date of the match who the who are the two teams that were playing against each other and this uh in this match that happened in 2017 sunrises hyderabad were playing rcb rcb won the toss and then you have the toss decision uh so rcb decided they want to feel they want to bowl and when when we say result and it says normal what we're trying to do is either the match uh was a draw or not a draw the match was incomplete due to the rain or something like that right if it says normal it means the match finished as it's supposed to finish right perfect and then who was the winner of the match how many runs did they win by how many wickets did they win by uh who's the player of that particular match where was it played again this was played in rajiv gandhi international stadium in upper uh so i i am hoping that's in hyderabad right okay perfect who is the umpire you see you can find out who is the first umpire second number third empire and all of that whenever whenever it says nan it means not a number it means that data is not available to us but you see all the amount of data that we have here it's just 10 top 10 data that i've just printed it for you so how big is our data set even to begin with we have 756 match details guys 756 match details of all the matches that has ever been played right as i told you there's 18 columns of data 18 columns is basically uh us giving the id season uh city date team one team to toss winner result all those things so in total if you have to ask a question to the data set saying hey how many matches of how many details of matches do you have the answer to that is 756 as you can see and how many columns what are the data that we're taking a look at id season city date team one team two who won the toss what was the decision of the task what was the result of the match uh who was the winner how many runs did they win by who was the man of the match or player of the match as it's called where was the match played which stadium and first empire second empire and third empire these are all the data that we have right perfect next we have to go on to do is some data pre-processing when we say data pre-processing uh we just have to find out if there are any nan values any n values are any n basically stands for not a number it means these values literally are not numbers or they are something else in this particular case wherever it says false it means that there all the data is correct and you know it can be uh it can be numbers it can be data it can be your regular data as well wherever it says false it means that there is a string data to it now if i ask you this question saying hey that rcb versus csk match that happened where was it played you will give me an answer in terms of a place right you will give me an answer which is the name of a place you will not say the match happened in three well what is three i don't know right exactly so similarly here as well whenever it says false it's telling me that hey here there's there's no number that's an actual string data and it is correct as well so i'm basically printing out here saying hey do we have any nan values or not if it's not there it'll print false if it's there it'll print true right perfect the next thing we're going to have to do is we're going to give a little bit of a statistical description of the data set first of all ladies and gentlemen what is the first thing that we know there are 756 matches that have been played right the count is 756. now what is the mean of uh let's say i want to ask this question out of the 756 matches played what was the average win by runs how many runs was the match one on basically of all the 756 matches and the answer is right here it's 13.28 this was the average number of runs you know come across the entire data set where you can find the maximum win by runs you can find out the standard deviation the minimum value 25th percentile 50th percentile 70th percentile and maximum values as well these four will give you all the quartile divisions and all the statistical values that you need right now but then hey let's keep statistics a little to the side because we are asking all the fun questions to our data set right now correct perfect now what is the next question that we can ask how many matches were played according to the entire data set right so we already saw this we know that we have 756 matches in this data set again am i using 100 lines of code to get this answer no it's literally one line of code one very simple line of code where i'm performing data analysis right i'm asking it a question i'm saying hey how many matches were played it gave me an answer the next thing i'm going to ask the question saying is okay so how many ipl seasons are do we have for this particular data right you can see here right from 2008 until 2019 we have everything right 2008 9 10 11 12 13 14 15 16 17 18 19 so from ipl 2008 all the way to ipl 2019 we have the data right now guys uh i want you all to head to the comment section right now and can you guys please tell me who won ipl in 2008 and which team won ipl in 2019 right just give me these these two names who won it in 2008 and who won it in 2019. now there is one more thing that you guys have to understand is that the data set we have here might not be the exact same thing of how it happened in the real world of course there can be certain changes it depends on how the data set was put together and all of that right so whatever answers we get of the questions that we ask let us assume that they are correct with respect to the data that we have of course right even though there might be a case where you know that ipl team that won by scoring maximum runs or something else then it's shown here what it is showing here is basically the the data which is which it understands from its data set right now if i ask it a question saying hey which team won uh which ipl team won by scoring the maximum amount of runs the answer is basically uh in this particular case the this match happened in 2017 it happened in delhi it was between mumbai indians and uh delhi daredevils and you can see a lot of things mumbai indians won the match and they won it by 146 runs they won the match by 146 even though i don't watch cricket this is absolutely a fantastic thing to look at right player of the match or someone called lmp simmons uh it was played at ferrari ferocious i think is in delhi right and of course you can find out the name of the umpires and all of that you see i just asked one question the entire code is just again one single line of code we found out that hey as of now the mumbai indians are the guys who basically have the highest uh amount of leeway in terms of winning by maximum runs right perfect now uh you know the uh the other thing uh that we definitely have to take a look at is which ipl team won by consuming maximum wickets right which ipl team just took away all the wickets out there and won the match this again happened in 2017 it happened in raj court there is a match between gujarat lions and kkr kkr won the match and they took all the 10 wickets of it right so you can find uh it was played at this place called saurashtra cricket association stadium i am sure you guys will know where this is you can again find out the name of all the umpires as well right so the question i asked is saying hey python find me the answer to which ipl team that one by taking the maximum wickets maximum wickets of what 10 wickets you can take right so if you take all 10 this is the team who did it and who did it first back then right so kkr were the people who did it the next question i can ask saying is i okay so which ipl team won by taking the minimum amount of wickets right without they didn't take any wickets but maybe they had fantastic bowling where they controlled the field really nicely this again happened in 2017 uh there was a match between sunrisers hyderabad and royal challengers rcb and of course uh toss was won by rcb they wanted to field so of course sunrises hyderabad were doing the batting first and sunrises hyderabad won just by scoring 35 runs and they never took any wickets so probably in this match rcb didn't perform really well or srh had like a fantastic uh you know bowling strategy or something like that and rcb just couldn't catch up here right so you can see again it is uh all of this is basically me uh just trying to i lock into the data me trying to just look into the data to go on to understand right so this is this is something uh which which is very very very important for all of us all right a lot of you all have given me the answers uh rr and mi 2008 rr and 2019 mi1 okay guys perfect thank you so much i actually did not know who won in 2008 and 2019 right all right perfect abhiroop says yes without the data set why are we watching well abhiru that's definitely a good question yes the data set will be provided uh for all of you all again you definitely have to sign up to one of the courses that we have at great learning academy uh then you will have access to all the learning material apart from the pdf there as well so i hope again i i answered that question so your last five or six comments wasn't loading guys that's the reason i just couldn't see your comments data set yes you can find out uh the data set that is provided again when you enroll yourself into the great learning academy program that we have okay perfect now let's get back to answering some more fun questions the next question uh that i can ah that i can ask is saying hey which season consisted of the highest number of matches that are ever played okay from 2008 to 2019 which was that season where maximum amount of matches were played look at that that's a beautiful looking graph right so i think it has to be 2013 right yes 2013 has somewhere around 71 or 72 matches that was played and that is the highest until date uh in 2019 how many matches were played i think it's less than 60 right yes so uh you know you can just find out with again three lines of code you can print a beautiful colorful graph like this and you can just find out in a second of where in which ipl season the maximum number of matches were played and after that i want to ask one more question i want to say hey which is the most successful ipl team with all the data that we have at hand now guys do not fight in the comment section for this you have your own favorite teams i am very sure uh you know i would have had a favorite team if i were to watch ipl but at the end of the day with respect to all the data that we have let us see the team who won the maximum number of matches right so as of all the data from ipl 2008 to ipl 2019 you can see that mumbai indians have won the maximum number of matches after mumbai indians it's csk after cska it's been kolkata knight riders after kolkata knight riders it's been rcb punjab uh rajasthan royals delhi dead and all this is this looking at this graph uh you know you can just find out in in like what 10 seconds you can look at it find who has the maximum runs and boom you have your answer right and all of this again just take a look at the code that we have written for it two lines of code and we have this entire answer right perfect now let's ask one more question to it right so let's just go on to ask a question saying what is the probability of us winning a match but but if the toss was one right now there are many cases where you win the toss but you do not win the game or there are cases where you lose the toss but you still win the game right so in this case what i want to find out is hey how many matches or what is the probability of winning a match if i also won the toss right so if you also won the toss there is a good chance that you have 393 matches that have been won because the toss was one but there are also been 363 matches where the game was won by the person who lost the toss as well so if i just take a look at this let me actually print a little graph here so that you quickly can see this uh so wherever it says true it means that the person uh you know won the toss and won the game whenever it says false the person lost the toss but won the game here if i just take a look at it and if you have to ask my conclusion on this i can take a look at it and i can say hey i don't think the toss will matter a lot now you might say very differently saying oh no toss is everything in ipl right i understand but as of what my data set is showing right now it says that hey there is only very very slight advantage that you might win if you actually won the toss right so this is a very important question the next time you have an ipl conversation with someone tell them that hey you know what we tuned into a live session at great learning and these guys proved it with data saying that your toss is not literally everything else right so if you want the toss it doesn't mean you want the game if you lost the toss it doesn't mean you lose the game there is still an equal amount of chance and for the person who won the toss there is a slight amount of extra probability that they might win the match right that's all the uh big hype with respect to that perfect next just it's a setting for a couple of pandas uh data frames that we're about to discuss so uh the part where again we're gonna ask a very very important question is uh how many teams which teams basically has the highest number of runs per season right highest number of wins sorry now we can check this out per season as you can see right um let's just go back up in ipl 2008 rajasthan royals were the ones with the highest amount of matches they won 13 matches in ipl 2009 delhi daredevil won 10 matches in 2010 mumbai indians won 11. in 2011 chennai super kings won 11 matches uh in 2012 it was kkr who won 12 matches in 2013 mumbai indians won 13 matches you can see that right i'm just going all the way to 2019 and i think for the last two or three years you can safely uh uh you know assume that chennai mumbai and probably delhi these three guys are always on the top there right there's just an average just a guess based on actual data that i'm seeing on my screen right now guys i just see spam in the comment section guys person called anish anish kumar please do not spam the comment section if you have a question please ask it out i'll definitely answer uh into this right perfect uh abhiroop says do we have to enroll to a paid course to get the data set absolutely not abrupt all the courses that you find on great learning academy are absolutely free of cost so you can enroll into our data analysis course there on great learning academy which is again 100 free there you can have access to all the data set right okay all right guys just give me one second let me just be right back let me just drink a little bit of water right all right perfect now we're back ladies and gentlemen uh okay so i have one more question uh patel hurst says sir which software are you using well very good question patel thank you so much for tuning in to our session i'm using google collaboratory right uh so it's basically let me just show you google collab as soon as i hit enter this is what i am using right google collab this is basically a notebook that will help you write python code easily as well right okay so what is the next question that we're gonna try to find out right every time the toss coin right that there's a coin in which they do the toss every time it was thrown uh what what are the decisions that people have chosen most of the time and i think this is something that even people who do not watch cricket can sort of uh analyze right so as soon as you win the toss you usually go bowling depending on the pitch or something like that so that you know you have a target that you can catch up to or you can hit the target right but it is very important to know the target and that's the reason i think a lot of people take fielding right so guys uh what what do you think about this do you think this is an actual uh this is correct or uh is my train of thought correct on this as well you choose to field because you just have an advantage of knowing how many runs you can chase later right that's that's an advantage i guess perfect okay so the next one the next one is very very interesting ladies and gentlemen this is again something that all of us are very curious about who is the man of the match who has won the player the name of the player who has won the man of the match award most number of times highest to the lowest right now in this case uh if i just keep play picking it again hang on this just printed me a lot of data that i have to scroll up on yes perfect now i think the first name i as a bangalorean definitely recognize ch gale stands for christopher henry gail i presume gayle is the name and yes for everyone that's asking me question the question in the comment section saying can we get the code yes you guys can get the code but definitely guys take a look at the fun part of it we can get to the code part of it later as well right perfect so yes uh gail has won the maximum number of man of the matches after that it's a b developers i know these two guys are from rcb uh warner uh rohit sharma dhoni dhoni i know is from csk and apart from that to be very honest of course from rcb uh and apart from that guys to be very honest with you that that's all the players that i know uh can anyone tell me uh warner what what ipl team does he play for warner come on guys he has won man of the match award 17 times from ipl season 2008 all the way to 2019 uh which team does he play for right so you can see in a descending order of all the people who have their who have one man of the matches right there are some people who have won it just one time also perfect now while you guys want to give me that answer let me just quickly show you one more important thing an important question that one can ask is saying hey in which city were the maximum number of ipl matches ever played and i think the answer is very much the you know very much solid in front of you mumbai so what is the name of the stadium that's what is the name of the stadium that's in mumbai guys what is the name of the cricket stadium that's in mumbai because from 2008 all the way to 2019 101 ipl matches were played there and if you take a look at everything else 70 60 50 40 30 all these numbers are small compared to 101 right so what's the name of the stadium that's that's that's in mumbai where 101 games have been played right you tell me the answer to that and we can wrap up this demonstration right perfect guys come on give me the answer okay superb it's called as the vancade stadium now that you guys tell me i definitely know the name of it okay perfect so guys it seems like a lot of you all are uh you know super fans of cricket and ipl and all of that right i mean it's super fun i just see everyone else in my house uh you know watching this as well i sometimes watch a couple of matches so uh just knowing that hey you know i was in fact showing uh this data set to uh you know to my friend who is not from a non-technical background and i've shown this to a couple of family members as well and explained to them saying hey look i have certain data such as this and i can ask my computers uh computer questions and it can give me fantastic answers like this and how cool is that right today it just looks like a simple data analysis that we're doing but even right now that we have the data set we can actually train a chatbot to learn using all those data and all these questions that i asked right how many ipl season uh are we analyzing how which ipl team one by the most runs or whatever it is instead of me asking this in terms of a paragraph and telling you guys i can actually ask this uh to a particular chat bot and it can give me the answer looking at the data set right that is when you use machine learning deep learning all of those concepts but of course what was the goal of this session the goal of this session is to make sure you guys understand that if we have a fun data set such as the ipl data set up you can ask so many questions now you on an average i think what we have asked somewhere around 30 uh 30 questions or 25 questions and we have got answers to it straight from the data and none of these is a lot of code right you can see that let me just hit clear output let me as soon as you clear the output you'll just get to see the size of the code that we have written right look at this every single time i am querying i am asking python something it's just literally one single line of code that i really have to write and with that uh you know i have all the answers that i have you know i i need right it's pretty much that and uh you know someone asked me in the comment section saying what is id min or id max or something like that right idx max so basically here we're trying to find the identification of that particular value and arrange it by maximum in this case if i'm trying to say find me the team which won by consuming the maximum wickets what am i asking i'm asking which ipl team i do not want a list of 10 ipl teams at one i just want one ipl team who won by consuming maximum wickets to get to that point we require idx max right perfect guys so with this i hope we are all very clear with respect to how you guys can go on to perform very fun and interesting uh you know analysis data analysis with respect to the ipl data that we have now usually after this i get a very very important question ladies and gentlemen saying okay so i understand data analysis now where do i go from here well let me tell you this data analysis even though i hope or we we uh you know made it very clear with respect to the session you guys have to understand that there is so much more to this as well a simple google search where you can type in learn data analysis something by the time i did that right just that sound you will be given what 100 million results and all of those are giving you learning material to read or most of them now there is a huge advantage here knowing that you have access to some really good learning material the disadvantage is not all these learning materials will give you a structured way of learning you will not have a structured path to learn if you do not have a structured path to learn things can get very very very complicated after you clear the intermediate level and once you're working towards advanced if your foundations are not very strong or if you just took a haphazard way to learn it will be very difficult after that guys so make sure you your learning is very much in a structured way and that's the reason i keep asking you guys to subscribe so many times right uh when you subscribe to the great learning uh uh you know when you subscribe to the great learning youtube channel or when you opt in for courses at great learning academy you have this fantastic opportunity of first of all having the highest quality of learning material available to you second of all whenever you're joining in for us with any of these live sessions that we have you guys need to know that you are you are having all the advantages and you are the winners here right so you guys not only get to talk with us for example i am a subject matter expert in terms of data science and a lot of other uh you know niches in this domain so whenever you talk to me or any of the other subject matter experts that we have here at great learning the biggest advantage you will get is that first of all you will get the highest quality of learning material possible second of all we know what is happening in the industry right now we know uh you know for example what's happening in the data analysis industry what should you learn what should you not learn what are the trending tools uh you know what's the next best application that you can work towards or what is that framework that you guys have to learn see five six questions you know again are very very very important it takes months and months of research for us to sit and do it as well so whenever we go on to join you on these live sessions we will not only give you the highest quality of learning material we will align it with exactly what is required in the industry that way you're winning not only once but you're winning twice but then to get to that point you definitely have to be subscribed to the great learning youtube channel and of course you guys have to keep check out checking out great learning academy we have multiple courses in fact we have 200 plus courses that will give you access to a lot of free learning material there right so guys i hope all of you all have created accounts and you guys have started learning and earning your free certificates of completion on great learning academy you can definitely get more content on data analysis with respect to our great learning youtube channel and of course even our great learning blogs as well so whatever uh way of learning makes you happy you know we have covered it uh we are very structured learning management system we also have the youtube channel we have blogs all of this again is created by subject matter experts who are completely uh you know thorough who are professional who absolutely know everything in their domain and that's the reason you know we have this fantastic adventure of you know putting out great content for you guys right perfect okay guys again uh i see uh comment section being spammed by sirochord please do not spam the comment section i'll answer your question uh your question is where we have to learn kotlin uh well saroj we definitely have a couple of videos here on the channel where you can check out kotlin so just make sure you're opening a new tab uh and you can check out what offerings we have here for kotlin but of course we definitely are uh you know planning to work on uh kotlin a lot more so make sure you subscribe to stay in touch uh with the future as well right perfect so yes yes with that i hope i have answered your question right now guys it's a very important thing that of course i just showed you three places of where you guys can get started with the domain where you guys can quickly get started to go from beginner to an intermediate learner with any sort of free resource out there but if you guys definitely have to work towards complete expertise and you guys want to become thorough experts and you're confident saying hey i want to have a job in this particular domain and i want to work towards it i want to become a thorough expert and now be data analysis business analysis data science you know we have multiple specialization in post graduation offerings we have pg programs mtech programs ms programs done in collaboration with some of the most fantastic universities out there we have university of texas at arlington basically texas macomb school of business then we have ps university northwestern university we are tied up with stanford you know that stanford is an ivy league school and we have a lot of other schools as well right so let me just quickly take a second to guide you across with uh one of these uh you know career changing uh programs that we have here you know beautiful programs you have a lot of interest a lot of people uh you know asking us about it in fact data analysis falls mostly under the domain of data science itself and i present to you a great learning sports graduation program in data analysis in data science and business analytics now this is a program that is done in collaboration with the university of texas at austin this is considered as india's number one data science program that's done in collaboration with these universities there is a very detailed curriculum that you guys will get to study and of course it's a mentor driven uh program as in you will get multiple hands-on projects where you'll be working with the guidance of mentors you get one is to one personalized mentorship and of course this is 11 month program right it's completely online it's 11 months so if you start right now by the end of you guys finishing it you guys will be thorough experts in data science if you have put in the right amount of work right so you get uh you get ample job opportunities because your resumes get shared with 350 companies that we have and 7000 plus alumni report that they have an average salary hike of 50 or 48 after they completed the program right and you can look at these certificates the certificates are absolutely starting to look at you get two certificates one from uh university of texas at austin the other one from great lakes as well and you can find all the details about the curriculum in the website in fact you can download an entire brochure as well right so you can just find out all the quizzes all the projects all the things you'll be learning in that module sub modules you can find out everything you need to know about what it is that you will learn from the most basics all the way to the most advanced concepts right at the end of it of course you'll have a capstone project a capstone project is where you guys will be working on a big project a real-time project under industry level expert guidance and using that you guys can put it on your resume you will get help with respect to building your resume as well mock interviews career fairs you get access through career fairs guidance through one is to one mentorship and of course you get two post graduation certificates one from university of texas austin and the other one by great lakes as well right again uh just crawling through you can see a lot of different type of projects that we uh we're all working on thousand plus projects have been completed in 22 plus domains and we have three papers that are published in iim bangalore as well so these projects are some of the most fantastic projects uh you guys get to work on and of course at the end of the day since you guys are working uh you know when you guys are working towards becoming expertise it is very important that you have the leading academicians and you have thorough thorough thorough experts for teaching you and that is the case right so we have people here uh you know who are completely knowledgeable who are absolutely uh thoroughly knowledgeable in the car and in this domain that we're talking about in our data science and business analytics and then when you're taking a look at mentors mentors are equally important just as much as your faculties will be teaching so again we have fantastic mentors for you as well and if you keep scrolling you can find out all the carrier support that we have in terms of mentorships interview preparation access to jobs exclusive recruitment drives and a lot more right guys perfect and of course it comes with a program fee it's 2 lakh 25 000 you know the price might change but there is a lot of demand for this particular program it's 11 month program with 225 plus hours of learning and you get 17 real world projects as well there's an application process that you guys have to follow you have to fill the application form and after filling it we'll have a screening call with the admission director's office and after that is done you guys will be extended and offer a letter for uh the people who get selected as well right so you can find out more details about it more faqs frequently asked question for any sort of thing details about the program details about the fees details about why you should take a look at data science and all of that as well right so we have multiple different programs your pg programs ms programs and a lot of it that you guys can definitely take a look at i'm sure you'll have more queries around uh so here just make sure you guys are putting down your name your email address your phone number and your work experience if you're a fresher you can put in your work experience as zero it is absolutely fine we will make sure that in the next four hours someone from our admissions office will give you a call and you can ask them all the details that is that you guys require about this program about which program is right for you what you need to do in your career if you want some assistance about which program will help you get to your career goal if it's worth it okay i'm sure you'll have a ton of questions you can direct them all to the expert who's going to give you a call absolutely free of cost as well so make sure you guys are putting down your number uh your name and of course your work experience and your email and this way we can have our experts reach out to you right guys perfect now let's quickly head back to the presentation let me take one or two questions and we can go on to end the session right sir i want to do specialist in this data science but it's too late for me since i've completed my graduation in 2007 is there any chance i can get this job well priti kadam thank you so much welcome to the session definitely yes it is never too late to learn i want everyone to understand this right now you do not have to be in school or college right now to pick up this program it will definitely help you out you can make a very good career switch you just have to find time and since you tell me that you're freelancing pretty i think you will have a lot more time to dedicate to this this is a completely online program i definitely think uh you guys can you guys can do it and of course pretty to answer your question as well yes uh you know you can go on to do this data science is a fantastic domain uh that has really high paying jobs and it's not just hyping it's just that you get you get you get access to fantastic tools libraries work with it's a huge community uh and you're surrounded by the best minds in the world is something i'd like to say so i hope that answers your question all right guys with this we have come to the end of the session thank you so much for tuning in from wherever you guys tuned in from the code 19 pandemic is not yet over so please stay safe by wearing your masks by make sure by making sure that you guys don't step out unless it's absolutely necessary guys please stay safe right now we're all learning from home we're working from home or whatever it is this is the right time for you all to make sure you can work on your careers you can work on your goals you can work on your dreams because once the covered 19 pandemic ends the job market is just going to go on you know on a on a huge riser on a huge steep where it's just constantly hiring for people right if you start right now there is a very good chance that you know you can be at the right time at the right place to catch that trend train to make sure that you know you can get your first job in your favorite mnc you can get your dream job if you just put in the right amount of time as well right so as i just mentioned i hope these specialization programs definitely help you uh go from being a beginner all the way to being a thorough expert make sure you guys are heading to the great learning website and checking that out and in the meantime guys we had a lot of you all are tuning into this session thank you so much again please stay safe from the pandemic do take a second to subscribe to the great learning youtube channel and hit that bell icon because every time you subscribe you're not only showing us the support that we require for this channel the second thing is that you're basically subscribing and you're liking the video is telling us that you would want more content such as this so we can bring you more content right so guys uh make sure you're all subscribing as well and on that note i wish you all well uh you know i hope you have a great day ahead right thank you so much guys i'll see you in the next one it's always a pleasure this is anirud signing off cheers guys
Channel: Great Learning
Views: 11,153
Rating: undefined out of 5
Keywords: yt:cc=on, eda, eda in python, eda using python, exploratory data analysis, exploratory data analysis in python, exploratory data analysis tutorial in python, exploratory data analysis python, exploratory data analysis using python, exploratory data analysis pandas, eda example, what is eda, eda data set, eda on a dataset, steps involved in eda, objective of eda, eda use case, python, Python basics, python for beginners, python tutorial, great learning, great learning academy
Id: 3MG9KSwAc08
Channel Id: undefined
Length: 66min 20sec (3980 seconds)
Published: Mon Apr 26 2021
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.