Masterclass on Data Visualization

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] good morning everyone my name is vishwas I had the IT software division at our Donley global outsourcing I welcome you all to the data artistry tournament 2018 a one-of-a-kind tournament that we are very proud to host at ara Donley and this is being made possible by a partnership with Ram nur and mass comm EDA as we know has three main characteristics volume velocity and variety traditionally businesses have been focused on trying to enhance an amass the amount of data that they could collect on their customers and based on their customers demands now as the volumes increased the need for variety and velocity also increased now why am I talking about these three characteristics because it presents us a unique challenge where we now have to have a proper combination of volume velocity and variety in order to make sense for the customers now let me take a step back and try to put this in the context as they say context is king from a customer standpoint they can always provide us in number of data sources in number of variety of data sources and obviously the amount of volume of data which is just quadrupling by everything historically IT organizations have been more focused on trying to manage the volume of data over the last few decades I think we have done a fair job trying to manage that volume of data as it grew along with the industry now then comes the challenge of variety and velocity with variety brings the challenge where we now have to worry about the different data sources with variety comes the challenge of managing the velocity because as the data grows in the industries like IOT and all the other areas where the velocity and variety have a deep meaning to worry about we have been more focused on trying to sense how we can make sense out of this data I have to put this challenge of volume velocity and variety into context think of it like you're on a highway without any mile markers or any display signage so essentially what it means is you're headed in a direction where you don't know obviously you're going to get lost now if I have to look at picture imagine the same journey being taken up aerial wise now you see where you are really headed into from that perspective if I have to make sense for my customers data I have to start looking at the connections that each of these data sets make there is always a story to be told and we as humans always relate to stories because that's what we can intrinsically relate ourself to and our sense of understanding from a frame of reference of this world is always related to pond in the in the world of data insights if I have to talk why is it important data insights enables us to look through those perspectives of the customers and their customers data in order to make a real sense out of it for example if you are looking at a volume of data which is maybe about an elections data it makes more sense to look at not just the amount of voting that has happened from that constituency it is now to look at how has been the demographics which is supporting the local leader over there now we need to look at how many people came for the wording how many people actually turned up at what time and what was the velocity and the variety in terms of the demographic of the people who came for the wording if I have to extrapolate the same problem statement in the world of IOT there's going to be a mass of data which is beyond anyone's control because this is going to be live and managing that data set in order to make sense out of that is a challenge by itself traditionally if I have to look at it would a simple Excel spreadsheet or a web-based application would it cut probably yes and no but can it actually deliver the right value the right insights for the customer that's where the data artistry comes into play data artist is not something new it's been in the existence for many centuries it's the enablement of these technologies which enables us to make sense of the real data is what we are talking about data artistry in itself enables the different technology and the creative part of it all come together what I mean by that we have data scientists who can actually make sense out of data in terms of what is it's trying to tell in the data artists who are creative guys well more look at it from a how can I enable the storytelling for this data set you're going to have technology guys for coming in terms of understanding how can I desensitize the data how can actually provide more insights by providing the reader transformations so all of these areas are gonna help us to understand the real story behind it as I said any data which is not in the form of information is not going to be useful for our customers so the pursuit of making sense out of data why are the journey of making the data as information is the whole and sole purpose of what the data artistry is all about but that being said let me take the opportunity to welcome Anand who is the CEO and the chief data scientist at grambler so Anand is going to walk us through an exciting journey in terms of data visualization and data artistry can do to the typical data sets which are by default probably boring and how he can he and his team have actually worked on several examples in the past and also essentially ensuring that you know how the data insights can be brought out of a simple large or medium to complex data sets so we first told you why data stories are important what I'm going to tell you is how you create these data stories as I've seen it there are four ways of telling a registry you start by just exposing the data and that in itself is pretty useful sometimes but you can go beyond and start telling stories as charts for example showing people the data that is explaining to them what's happening behind the scenes not just as numbers but better yet has pictures because we understand pictures better but even that often is not enough we you usually want to provide an explanation for it an explanation helps because people then understand the story behind it they understand the context they understand what they need to do and that's the next level of exploration but the thing is we don't necessarily want to stop here because what this assumes is that the storyteller knows more than the audience sometimes with the power of data the audience can discover things for themselves then story teller does not know and that's where the power of exploration comes in and these are the four broad ways I've seen data stories being told somebody just dumps the data and says take it and do what you want with it or something says here with a bunch of charts make what you can of it or somebody says here's an explanation this is what I've inferred from the data and I'm helping you understand or people say I'll provide you the interface that will help you understand what you can out of the story and you then make your own data stores each of these has a certain value for example if you take the effort of the Creator it's very easy for us to just expose the data well I would say very easy relatively easy to just expose the data and to show the data as charged that's not too difficult but then if you want to put together an explanation you need to know the domain in the first place if you want to put together an exploratory interface then you are crafting a full-fledged application both of those are hard for the audience for the consumer on the other hand showing what's happening makes it easy for them to understand as opposed to just looking at the data and explanation is where somebody is already thought through what needs to be told and people are just absorbing it it's like reading a book or a novel like a picture sorry on the other hand if I'm just given a dump of the data that's not very easy for me to infer or make sense out of if I'm given an interface to explore then I have to take the effort to create a data story on my own so that's not a little the easy having said that every single one of these has its place and what I'll be doing is showing you examples of how these data stories can be created using each of these four formats why is this important for this particular hackathon you will typically be picking one of these formats and every single one of these has its own advantages and disadvantages what I'd like to talk about is how you use this format to tell a story and make sure that you get to the objective that you want to and make it clear upfront that I'm going to create an exploratory interface for instance I'm not trying to tell the story I just want the audience to be able to understand what's happening or I am going to tell a story but I'm going to tell the story not by way of an explanation but by helping people understand what's happening quickly and putting it through pictures so once you have your objective reasonably clear you will want to look at firstly the data itself now before I go into these formats I just tell you a little bit about just exposing the data or what it takes to put the data out now even though I've called these four formats in reality from a a story perspective there are only three and exposing is not part of it because I mean that's just getting the data out right but let's not forget how much effort goes on behind the scenes in putting the rate out for example if you take the Election Commission data which is available actually in the form of PDFs on the Election Commission website and when I say PDFs I mean it's lots and lots and lots of PDS there are several thousands of pages of data sitting in this format from which the first big task is just getting it into a textual form getting it into a form where each of these can be converted into something that is possible by a machine and even then there are several issues with this to begin with they are not necessarily aligned you will see a bunch of gaps here you will see some names are split into the second row for example I discovered after looking at this data that the state that has the longest candidate names is good job so there would be a huge name somebody by somebody by somebody by sudden off somebody by somebody by and there's also an elias for that which effectively takes the name to three rows I thought it would have been on the causation but it apparently is it now this kind of formatting is not very easy and you have to do the parsing to make sure that you convert it from this kind of a textual format into something that can be loaded in Excel and that's a piece of work in itself and you think that at this point you've got the data right no this is where the work really begins because then you have for example when you look at how many candidates stood for election in each constituency you will find that there is a constituency Allen in which there were five people standing for election in 1957 three people standing for election in 1962 and so on and then suddenly in 1972 onwards they stopped standing for elections and slightly differently but is evidently the same constituency where people have been standing for little and then suddenly they stop here and then start back at the old Alec in other words the constituency name has been spelled differently in different periods the same is true for our movie and unhappy where you can see this kind of it or acne and acne so which means that we also have to take care of corrections in the data now these kinds of Corrections take various forms so it could be phonetic it need not be alphabetically similar like Asher and that was easy to spot because we were just sorting it alphabetically so you can see that one constituency is very similar to another but an kun da porra which are obviously phonetically identical and could be the same constituency not necessarily tropical alphabetically in fact that toughest problem that we had was with banda and much we partner which we realized later on was actually the same constituency but I mean how do you even figure that out unless you have some knowledge of the domain right and it's not just constituency names that are misspelled we have for example party names that are misspelled these are all the same party but just spelled in different ways you have party names that actually change the alien cave is the a a AIADMK is the ADK just said this was a party name at different times even officially and parties can restructure the anc indra is not really i and c but for all practical purposes that is what we need to treat it to be all of these elements requires us to get the infrastructure to be able to expose the data and that's what comes first but the good part is as far as this hackathon is concerned has been done for you you have the data you just need to figure out out show explained or explode the data what do I mean by show the data and you ask the question look if you have the numbers I have the data then what's the point of just showing it what value does mere visual representation add well one of the first visualizations that we created long time ago was a video of India's weather we took every single districts temperature from 1900 all the way to 2000 every single month this is a fairly detailed and large data set and throated it on a map and said let's run it as a video what you find is that some portions of the country are relatively cool through the year the pieces that are green are cold the pieces that are red are hot yellow somewhere in between and you can see that is relatively cool through a year which you obviously knew you also find that the west coast of India is consistently cool through the year which was to me a bit of a surprise I actually managed find my retirement retirement spot based on this piece of analysis but the oddest thing that we found was even though the country is going through waves of summer and winter there are some spots that are flashy you see this one spot here which is hot when the surrounding areas are cool and when the surrounding areas are hot effectively exhibiting at counter cyclic weather pattern this is the last full inch at ism we don't know why and it's an ongoing research problem to figure out why Bella spore is hot when the surrounding areas are cold and cold when the surrounding areas are hot but the odd thing about it is that the first time that this was spotted was when the meteorology department saw this video they said maybe there's something wrong with the data they went back they investigated they came back and said no there's something wrong with the data you have discovered something on data that we have had for literally a century and are now seeing it for the first time sometimes you just need to see stuff to be able to figure out what's happening another case where the power of showing things was showcased was when we were looking at the election data that I talked about a short while ago we will look at how many people contest in each constituency this represents the number of people that are standing for elections in the 1967 terminar to assembly elections what we find is that these circles are larger where there are more candidates and smaller where there are fewer candidates plotted on a rough map of the Manavi so in 1967 it's the reason we mix in 1971 there are a few new constituency that can get added in 1977 there's a foot there's an increase in the sizes of the circles indicating that there are more candidates that are contesting in these elections and the color of the circle represents which party won you notice that earlier it was largely yellow and that was DMK that was winning all the elections now a new party has come in the ad m'kay and that has swept this state and we have a color change and this has also led to an increased level of participation across so as we were browsing through we were looking for anomalies let's take another example this is actually from the same election data that I showed you short or L ago the question that we were looking at was where do people stand for elections how many people contest each constituency and when we were looking at this data what we found was pretty interesting in the assembly elections in Tamil Nadu in 1967 this is a map that shows a number of candidates that stood for election in each constituency the size of the circle represents the number of candidates so for example in param balloon where where 10 candidates stood for election and that is therefore a slightly larger circle than the neighboring Rahul where there were only 5 candidates that stood for election the color represents which party won this was in 1967 if you look at 1971 things haven't changed that much there are a few new constituency's that have come in there are a few changes in the people that one in each constituency but the real change came in 1977 when there's a sudden spurt in the overall level of participation we have more people standing for election in each constituency and we also have a huge color change and that's because a DMK has come in as a new party and has swept the elections but let's get back to our original task where do people will stand for elections the most 1980 not that much of a change but 1990 84 we have motor on the curve where as many as 90 people are standing for elections in the same constituency you can see that most of these are independent candidates understandably you cannot have 90 parties it's obviously going to be 90 or close to 90 independence that are standing for election 1989 an overall spurt but nothing stands out 1991 we have two constituency to stand out but I'm now going to pause about this I'm going to talk instead about the 1996 elections where in mother on tucum there were as many as 1000 candidates over a thousand candidates standing for election in just one constituency think about that 1,000 people in just one constituency and this visual made it entirely apparent to the entire country when we showed it on national television that there's something really odd going on and this is a good data that has existed for many decades that people when they look at it this way finds a huge anomaly and what's even more anomalous is the nature of what happened here in the first place firstly if you look at the names of the candidates that stood for election this is what it looks like there's a poly Sonique a poly Sonique a Panasonic a poly Sammy a Panasonic a I mean if you are a poly Sammy Kaye then how do you figure out which one is Sammy Kaye you are to vote for yourself right there's certainly no picture of you this.what literally no party symbol so this is going to be a bit tricky which proved to be a problem I suspect because they were as many as eighty eight candidates who got exactly zero votes meaning they stood for election and didn't even vote for themselves that's the only way you can get zero votes right so what's happening here now these are cases where we take simple numbers and just put it together into a chart and people are able to spot things that were not otherwise spotted that is the power of showing stuff but beyond this we need to get two explanations explain why things are happening and the reason is you need to provide the context around it you need to understand why this is important now it's one thing to say that in mother and the company where 1030 candidates that stood for election it's entirely another thing to tell you the story behind why it happened which is that there was a farmers protest and there were people who said I'm going to just sit outside the election commission office and give people 500 rupees to just stand for election that is the story behind it and how does one communicate these stories mostly infographics are the commonly used techniques today but there are several more formats and a good place to start looking at these formats is just to go through the New York Times website as a leader in data journalism they are doing some fantastic work in terms of telling data stories and you'll find several examples of data stories out there that are very innovative in terms of format but I'm going to talk about just a few simple format that I believe you will be able to create during the course of this hackathon here for example is an infographic that talks about working hours the data set here was how many people have been working for how many hours across the years and there are several patterns of insights that emerge for example we find that as the productivity increases the number of working arts has decreased so for example Norway is a country which has one of the highest levels of productivity and they also have among the lowest levels of working hours on the other hand Cambodia is the exact opposite very low productivity and the highest level of working arts and this is fairly well correlated what you will notice about what I said was I am NOT doing much more than reading off of the content in the earlier visualizations where I was showing stuff I was giving you a lot more context here I'm not doing anything more than reading of this chart of this infographic now why is this important sometimes your means of communication is not through an interaction your only means of communication is through printed material your objective is to drive an action you want to tell people what to do you want to tell people how they should do it in that case you don't want them to spend any time and energy trying to figure out what your messages that should be abundantly clear the only thing that you want to drive is the action that they should be taking who uses things like this anyone who's in the business of convincing people of a certain activity so for example the entire NGO space finds this very effective in fact even in corporate presentations we find it very effective to tell people this is what's happening and the whole realm of slide where is largely about telling people what needs to be done but the output even though we are talking about infographics does not necessarily need to be static even with the tool set that we have it's possible to create something that is much more powerful for example what a pause here I didn't have this open but I'm gonna open it for example one can create a web-based animated interactive this for example is a story in which we talk about what's happening but also with a simple overlay of animated graphics on top of it created as a web application now what is a story that is selling and largely going to read out from here except not word by word but this tell you what happened we were looking for which is the party that had the worst success rate we knew that Congress was the party that had the highest success rate but is there such a thing as the worst party that had the maximum defeats obviously party that never won counts but is there some party that has just never ever won despite standing for elections in a huge number of constituency's turns of the days one called the DeRusha party which has contested in over 700 elections and never won a single seat and this is across three different elections the first time they stood in about 90 odd constituency's let's get that number in 1984 they stood for elections in 97 constituency's mostly in the u P region and the Gujarat region 0 wits in 1989 they expanded becoming a party that contested in 298 order the 500 or constituency's making them incidentally the second largest party after Congress in terms of the number of seats that they contested out of which they were exactly zero or in 1981 1991 they contested in 321 constituency's they never gave up and out of the 321 they won exactly 0 seats now what is even more interesting is that not only did they not win in even a single constituency they were never even the runner-up ever at best they were number 3 and that is usually in constituency's where there were only three candidates so we are here telling a story in a format that is web-based the format could also be a PowerPoint interactive story the format could be a print story the format could be a mobile application the format could be even a physical design that can tell a story in this hackathon we are not restricting you in any way with the format except we probably don't want you to create sculptures but leaving that aside pretty much any format is okay and if you want to create an explanation just keep one thing in mind I as the reader should not have too thick you should just tell me what I need to take away and it should be interesting and it should be engaged that's our second format the other format that you can explore is one of exploratory visualizations now here you don't have an agenda you are not communicating what needs to be told what you're saying is I will allow you to interact with the data set in ways that I believe will be helpful so I will shape your exploration but I will leave the final conclusion to you you are the master you are the one that's directing the story and you will finally come out with an outcome that nobody knows but I will provide you with an interface to do this how does this work let me give you an example of how this has been done using just PowerPoint I'm gonna stay here one of our clients said they wanted to understand how many people are downloading apps from their App Store and where these people are coming from and could this be done as an interactive PowerPoint visualization so this is a presentation but not a presentation intended communicating a fixed message it's a presentation intended allowing people to explore it shows the number of people that download it the apps from the app store on a daily basis and it also shows the break up by country so most of the downloads came from the US followed by China Germany UK etcetera most of the models were Android four at that time followed by iOS 4 and so on the color indicates the rate of growth so Android 4 was not growing as fast as iOS 4 and therefore it's more yellowish or reddish or orange ish than iOS 4 which is much clearer now I can ask the question why is it that Android 4 did not grow so much click and that shows me the breakup of the volume of downloads of Android for the bulk of the downloads came from gapes but that is clearly not growing in fact it's shrinking by about two and a half percent now nor did the education sector group nor did the tool sector grow it's only the music and video segment in Android 4 that is pulling it up a little bit so now I have an understanding of why the Android 4 segment did not grow I can also ask why is it that games are not growing as fast because the newbie segment which is the one that is downloading the most games are not downloading enough games right now nor are the double income no kid segment but the retired segment is contributing a little bit to the growth and therefore compensating for these declines these are not questions that are posed beforehand to which a person constructs an answer these are questions that come up dynamically that the interface allows you to solve you typically won't be creating interactive PowerPoint presentations those are a little trickier it's usually easier to allow software to create the interfaces that will allow people to explore and these kinds of interfaces are pretty calm the data said that we will be talking about is the national achievement survey data this in fact is one of the it's that you'll be exploring in the hackathon I'll show you what an exploratory interface can reveal we were looking at how we can present the data in a way that will help us understand why students score higher marks or lower marks before that a quick explanation of what this dataset is about the NCERT conducted a large-scale survey in which they asked the question what do the students do what do the parents do what do the teachers do and the what has been broken up into hundreds of parameters how many books two students read how far they have to travel what kinds of television programs to they watch do they have private tuition or not hundreds of questions like this similarly for the teachers and similarly for the schools now the question was is there certain student behavior that drives better performance they also knew the students marks based on the their performance on a quiz across various subjects so our task was to try and see if we can create an exploratory interface that allows us to identify whether one particular behavior drives more marks than another for example if we take the behavior around reading do children who read more books score more so there are children who read no books 1 to 10 books 11 to 25 that's a type of 25 or more books and the size of the bubble represents the number of children who are reading no books or 1 to 10 books of so on and the 1 to 10 books is the most common the bar here represents the average marks these children score and you can see that the ones that we need 25 or more books clearly score higher and that is more than those at between fewer books and fewer books and no books at all and the difference between the worst and the best that is those that we know books and those are 325 or more books is about 8 marks out of hundred so I know two things a that reading books does help and it helps by about 8 percentage points now I could ask the same question of television watching those television watching make a difference it turns out that television watching has only a three percentage point impact so which means that rather than stopping kids from watching TV you are just better off channeling them into reading more books thus playing games make a difference yes playing games does make a difference as well and that is also only about 3% so reading the reading behavior has the maximum effect and the way we did this was through an interface that allows people to explore so this shows us the impact of various parameters so what is the impact of watching television it turns out that there's an impact of watching television in mathematics of 1.5% in reading of 5.3% in science of 2% in social science of 1.1 percent so you can see that the impact varies from subject to subject as well what's also interesting is that when it comes to let's say science the more you watch the less you score but it tapers off after a point once a week seems to be a sweet spot so if you're watching TV about once a week that's good you watch TV every day not so good at least in science the same is true for social science once a week seems to be about a sweet spot if I look at reading behavior - the more I watch TV the better my reading behavior becomes let's sake on the other hand mathematics now this is a disaster if I watch TV every day then my marks are worse then children who never ever watch TV and remember that the children who never ever watch TV are probably the ones that don't have TV at home which means that a well educated and affluent child is probably scoring lower than a child who would normally have had lower marks because of sheer level of affluence or degree of exposure that's how bad TV watching is for mathematics but on the other hand if I take a behavior like playing gapes turns out that the more I play the better it is for my mathematics course but on the other hand if I look at reading behavior it turns out that the more I play the less I score unless of course I'm not playing at all which is not a good idea but as long as I go out and places at least once a month I'm good now these are explanations that come out of an exploratory interface where we can dig deeper and see what's happening we dug deeper into this interface and found that for example the factor that matters the most across all of these parameters if I sought by what factor influences it there are four factors that matter fathers education father's occupation mother's education mother's occupation almost everything else is significantly secondary and it's also interesting that the father's education father's occupation are slightly higher in importance than the mother's education and mother's occupation now this is true for the country in in general but is it true for every state for example if I take a matriarchal state like West Bengal it turns out that the father is even more important the mother is far less important in Madhya Pradesh when it comes to determining the child's marks what about a patriarchal stick like let's say Punjab here it turns out that the mother's occupation is more important than the father which on the one hand may seem counterintuitive but on the other hand you could argue that that the primary homemaker is the one who has the larger influence on the child's marks and in a patriarchal society it is the mother that is more likely to be teaching the child and which is not to say that in West Bengal it's the father's entity teaching the child but I suspect there's more of that happening in West Bengal then it is in Punjab this is an exploratory interface where you're diving in clicking seeing what's happening and while I'm giving you a voiceover what I'm presenting is the information that was gathered by the existence of an exploratory interface and therefore you have the option of creating something that is exploratory that allows people to figure out their own insights or create something that shows people what's happening take an in-between ground where you make it easy for them to understand or you go the whole hog and create an explanatory interface where you tell them what's happening these are ways of telling data stories but remember this is just a format just a canvas your imagination is your playground feel free to create absolutely anything that you want all of this is going to be dependent however on the technology behind us and from a technology perspective I am just going to give you a few tips I am NOT going to say use this piece of software or restrict yourself to any one single technology but if you're looking for guidance on where to start then broadly for each of the three categories show explain or explore we find that di applications the likes of Excel / bi tableau are the ones that are easiest to create charts out of if you're not familiar with any one of these it's pretty easy to learn you can download all of them for free and start exploring them you probably already have excel on your machine and power ba da Bloo are fairly quick downloads do try them out this allows you to convert the data into charts if you want to provide an explanation interface on top of that then you're probably better off going into a design interface PowerPoint is perfectly fine so the shop so is sketch any tool that you feel like paper and pen is perfectly fine too for that matter a lot of really good sketch notes can be extremely powerful when it comes to telling data stories but don't stop here create a video if some of you may have seen Hans Rosling's video in fact I suspect all of you in this hackathon have probably seen it and if you haven't please watch it there's one with the BBC that he screened in which he stands in front of what looks like a hologram but it's not really a hologram and tells the story visually with the shots appearing in front of him I'd love to see some of that kind of work coming out of you guys the point though is don't be restricted by format and off the various formats the explanatory format is the one that gives you the most power and the ability to create artistic sophistry that is beyond what we would normally imagine however if you're looking for some starting tools these would work just fine on the other hand if you're looking to create and explain the exploratory interface then you need programmatic capability don't even attempt it unless you're reasonably good in some programming language what are the most common programming languages that people use JavaScript in which case you probably want to go for either d3 or Vega these are the two most popular libraries for creating data visualizations these days d3 being the more powerful low level one Vega being slightly higher level slightly less powerful but pretty and both of them have a very steep learning curve but both reasonably effective if you don't already know d3 then I would not suggest you try and learn d3 in the course of this hackathon it's you're better off finding somebody who knows d3 and pulling them into your team vega's probably learn if you want to use Python that's probably the second most language for a second most popular language for interactive visualizations then c1 is a popular library that's built on top of a library called matplotlib you could also use Boca and between these three Seabourn matplotlib and Boca these are the most popular libraries that are out there again don't feel restricted feel free to use any library of your choice between JavaScript and Python which one would you choose short answer whatever you know better don't worry about which one has a better output format in short answer is JavaScript is more preferred over Python but if you don't know JavaScript as well as you know Python go with method you are much more likely to create better output with that and the third language of choice that people use these days is art on which ggplot2 is the de-facto library for creating any kinds of visualizations and if you want to create it as an interactive library then shiney is a pretty good choice for building a web application server and exposing it in summary the tool sets are there I have suggested a few but you are probably familiar with some of these or others go with the tool that is of your choice that's the most important thing finally the data sets themselves we will be sharing the details of the three data sets that are part of this hackathon one is a cricket related data set one is a voter survey data set and the third is the national achievement survey data set on which I showed you some details before all of these data sets are accessible through an API the way the API works is you will be given an endpoint which is granule comm slash data sets in which the data is available in the form of a URL that you can construct programmatically but before I show you the programmatic construction let me quickly show you what the interface looks like you you let us take the IPL match results so this is a data set that has the results of all IPL matches from 2008 all the way down to 2016 March or me and we have the details of which city it was played in the date on which it was played the two teams that played which team won the toss what decision they made what the result was how many runs they won by and so on and while this is one part of the data set the other part of the data set is a ball by ball result in the first over of the first match what happened so Kolkata Knight Riders was batting and Royal Challengers Bangalore was bowling in the first ball sorrow Ganguly was batting - McCollum small with metal among the non strikers in and facing Akuma and there was a single leg by that happened and therefore the team got one total run as an extra at that level there is data for every single IPL match and this is one of the data sets that you will be exploring how do you get insights from this data set well you go to this interface and you can choose to export the entire thing as Excel or CSV s or if you're looking for a programmatic interface JSON or HTML if I ask the question for KKR out of all the data sets how many matches did they win when they chose to field versus when they chose to back so now what I find is that off the 62 matches that KKR has played they have decided to bat in 30 occasions and they've decided to field in 32 occasions that's not too much of a difference you you you and it turns out that we're KKR decided to bat out of the 30 matches 15 they won 15 they lost but where they decided to field they won only 12 and lost 20 so at least for KKR it's certainly better to bat we're at least they win half of the matches rather than to feel where they've lost about significantly more than half of their matches but this is an example of a simple insight that you get by downloading the data working on this offline now the question is how do you convert this into a story create a chart show it that's a possibility create an exploratory interface that allows me to figure out the story without having to do what I just did in so much time that's a possibility or spin a series of stories find out what happened to RC B find out what happened to this CSK find out what happened to each of these steeps whether it's better for them to bat or Bowl or feel and then string that into an explanation that says here are the strengths of the teams here the weaknesses of the Tees any of these are options the interface that you have to play with is there which allows you to filter and pick and download any subset of the data you can also export this as an API into JSON so for those of you who are into programming if you want to filter where for example the toss decision is fielding and the player of the match is McCollum and any such you can create a set of URL query parameters as a REST API and the result will come through as a JSON file which you can include in your app this for those of you who may be concerned is cores enabled which means that you will be able to host on any other server and still be able to pull the data off of this server we will be sharing the details of how this can be hosted and how you can pull the data in more detail but remember that the programmatic interface is something that you will need if you are creating exploratory interfaces and the interface that you have that allows you to filter and download will prove far more useful if you're going to be creating either charts for showing the data or an exploit explanatory interface that helps people understand what's happening with that I just want to wish you all the best really looking forward to seeing some of your submissions have fun more than anything else see what you can learn out of this process and I'd love to hear from you any feedback that you have on the hackathon and how it went thank you inspiring session Bionic and I was truly inspired to see what data can do for us with all this amount of information out there it's more imperative for us to see and respect what the insight that data can provide for us and I thank Allen for his time today this morning in order to walk us through a session in terms of what Graham that has to offer and what tools and technologies that Graham that has mastered now with that let's spend some time to understand our rules of the engagement for this tournament is all about so the tournament in itself is broken into several phases the first of the face is what we went through a session by anand today as inspiring as it was i am also eager to get my hands on that data set so that i can start doing the visualization just like any one of you now who's this session the registrations are going to be open for each one of you you can register yourself once you register we are also going to ask you few questions in terms of what's gonna be our technology that which with which you are going to do the visualization what your credentials are gonna be once after the registrations are complete you will be gaining access to the data set that on and talk about the three data sets that we would want you to use your data artistry your expertise and skills and provide the visualizations that we are looking for the registrations post which you will have about five to ten days time for you to come up with visualizations once your visualizations are ready during this course of time will also be assigning you mentors now you can reach out to these mentors should you have some questions or clarifications that you would seek and we'll be more than happy to provide you those clarifications please note that March 5th is the last date for submission please ensure you submit your entries before that once after the March 5th deadline crosses by March 12th we're going to be announcing the top ten entries now these top ten entries on the day of the finale are required to be part of the panel discussions they will be given an opportunity to provide the world presentations now out of these top ten presentations and the contestants we and our judges here are going to select the top one entry which qualifies in itself to be as a winner now that decision will be taken on the data of mass the 17th please ensure that you stick to these deadlines in order for you to submission and be qualified for this tournament and have been a chance to win the prize [Music] [Music]
Info
Channel: Data Artistry Tournament 2018
Views: 5,581
Rating: 4.9000001 out of 5
Keywords: Data Visualization, RRD, Gramener, Nasscom, DAT2018, ART, DESIGN, UI/UX, Data, Visualization, Tournament, TheLoft@RRD
Id: 5N10tVaNuD4
Channel Id: undefined
Length: 44min 53sec (2693 seconds)
Published: Fri Feb 23 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.