Tableau for Data Science and Data Visualization - Crash Course Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and welcome to this tutorial on tableau for data science we're gonna cover a few basic features of tableau and how I use it for data science and how it can kind of fit into your workflow here's an example of a data science workflow we have in this example six different components and this is going to be different you know depending on what type of environment you're working in but generally you're gonna have the data collection data exploration sort of the data wrangling data munging stage where you have to pull everything together and clean it for modeling and then the next step would be modeling validation and then reporting so the way I use tableau primarily is in the exploration phase and then in the reporting phase so the expiration I want to basically be able to look at the data once I collect it I want to be able to look at the different variables that exist I want to see you know how the data is distributed for example for each variable I want to see if there's any outliers or anything like that that I want to kick out from the the modeling because you know if you feed bad data into this model you're gonna get some you know your garbage in garbage out so being able to explore the data I find is really helpful and sometimes that's really all you need to know if you should move forward or not you know if you're trying to look to see if there's some kind of a relationship between variables sometimes a quick exploration and Pat blow a nice visual will help give you that and then also in the reporting side so once you've built your model you validated it and you want to be able to share your results with others tableau is also you know ideal for that they have great reporting tools it doesn't matter really what kind of organization you're in you you can pass if you have tableau server or or not you can you can pass your workbooks on to others so that they can view them as well so what is tableau so we kind of covered it briefly in the last slide but it's just a business intelligent / data visualization tool that allows you to make sense of your data really by looking at it right I mean sometimes we're looking at millions of rows of data you could check maybe the first hundred or you know first few thousand and look at the data but that's really not efficient and it's really not it's really not feasible these days to be able to just look at the data and figure out what's going on with big data and there's millions tens of millions of rows billions of rows potentially and being able to throw this stuff quickly into a tool like tableau and look at it really really helps a lot so here you have tableau desktop and then the server version so the DEP we're going to be using the desktop version today specifically tableau public which is free you can download it for free from their website and that's just a software that you download to your computer and then the server version actually is within the browser you can create charts and dashboards and then upload that to the server for consumption throughout your organization so the interface here for tableau desktop is really just drag-and-drop it reminds me of a pivot table in Excel so you drag you know the columns and rows around to get what you want and and the way that this works is these are the fields here and it's broken down into dimensions and measures so dimension I think of a dimension is basically a category or a categorical variable and then a measure is a numerical value so it could be the number of records it could be any number of things and then here you have a filter you can apply a filter here and I'll show you how to use this section later the filter in the Mark section but moving over here here's where you placed your columns and rows so if I want to basically show in column form I want to be able to let's just say pulley and current status here to the columns and the number of records to the row then it would give me status categories along the x-axis and then the volumes for that category along the y-axis here so we're gonna go over this with the live demo here but just want to give you a brief introduction to the interface and and how it's set up and this is the end result you know once you're able to make a few charts you could pull it together and make a dashboard like this for example so now we're going to go ahead and download tableau public I've just googled tableau public here and you're gonna have to put in your your email address here to download the app go ahead and put mine in there and download it it may take a while depending on your internet speed it's generally pretty quick so go ahead and install on your computer I'm gonna fast forward the video here so that you don't have to watch this entire install but feel free to pause the video and hit play again when you're ready to go okay so now I've downloaded tableau public I have a shortcut here was installed on my desktop and here we needed to connect to a file so it gives you several options here excel txt JSON access PDF spatial or statistical file and the difference between the tableau public version and the tableau desktop which is the paid version it's about eight hundred and fifty dollars is that you're you're limited to just a handful of inputs here and just a handful of file types the desktop version really has has more options here and you can't you know you can connect to a server like a sure or any sort of cloud environment so AWS Azure a Google cloud and pull your data in directly from there here so we need to go ahead and get our data set and the data set I'm going to use for this tutorial is the Titanic data set from kaggle and I'll include links to tableau public and this data set in the video description you just click on that and you'll probably have to sign up if you don't have a Kaggle account already and basically what this is is Kaggle is it is essentially just sort of the online olympics for data nerds you you have competitions for machine learning deep learning models companies can actually host their their data sets on Kaggle and have basically the best data scientists in the world compete to build the best model and and the winner can actually earn money so it's pretty cool so we're gonna go ahead and i've already got an account so i'm gonna sign in here and once you're logged in you can go to the data tab here and then we're gonna actually use dot CSV go ahead and download that I'm just gonna go ahead and move that to my desktop and now we're going to connect to that CSV file so it looks like an excel file but it's really a flat CSV file test-1 so we're going to use a txt file input here now we get to the desktop pull that in and once you open tableau public and import your data set it's basically just going to show you all of the data fields that exist and you may not know what those what those are so passenger ID survived passenger class name example for example so kaggle actually has a data dictionary here on the data set page so survival is a categorical variable one standing for yes - meaning no that they didn't survive passenger class is the ticket class so first second third class sex is the gender male or female agent years number of siblings parents children above the Titanic for the following variables and then ticket pick a number fare cabin number so the cabin number could be indicative of where they were located on the ship and then embarked is there a port of embarkation so just just looking at this data dictionary you can kind of get a feel for which variables might be useful in which variables may not be useful at all so survival obviously is very important we wouldn't look at the correlation between these variables these other variables and whether or not the passenger survived so now we're going to go back to tableau passenger ID that's going to be probably not very useful survive that's gonna be very important passenger class passenger name is not going to be very important the ticket number is unlikely to be significant fair that's how much they paid cabin number and embarked so you can see we have all of these columns here and you can always reference the data dictionary so we're gonna go ahead and click on sheet1 at the bottom and you can look how tableau automatically pulls in these different variables so they it makes a determination based on the type of data being pulled in whether it should be a dimension or a measure so in this class I'm sorry in this example we're going to really just want to look at the number of Records and how the number of Records relate to these different categories and whether the passengers survived so we're going to actually go ahead and make all of these at dimension so for age to become a dimension we may put that in two different bins so if I want to do that automatically for example if I wanted to choose an age range let me go ahead and give an example here I want to show this going down you can see here that it's it's just a measure here so it's adding up all the ages but what I really want to see is you know is there sort of a correlation between the age and whether or not the person survived so we we actually want to create a measure based on this age measure create a dimension I'm sorry based on this age measure and we can do that automatically by creating bins so we can call it age bin and the size of the bin is basically the number of years between each bucket I'll just do ten years and if we pull that in there you see we have a null value we'll go ahead and pull in the number of Records and then circle back to that null so here we go we have a distribution of all passengers on the Titanic we're gonna go ahead and pull out the null here it's very important to note that when I pull this over I'm not just clicking on it and pulling it over I'm hitting on a PC control on a Mac you would use the command button and then left-click at the same time and pulling this field over to the filters bar and if I uncheck the null value there apply and then okay that gives me the distribution without the null values so this is great but you know I want to have some context here by adding percentages so if I take a number of records here again and pull it up to the rows bar and then hit this down arrow here and you basically have this drop down and then quick table calculation and then percent of total okay so now what that did is add we have the volumes here above and then the percent of total below but I really don't have any labels so I need to add the labels here so this makes sense in the marks bar this basically allows you to change the coloring the labels the size of your chart and you have three in this example you have three sections you have all this section which is the summation of all the records and then this bottom one represents the basically the percent of total so for the second one we want to go ahead and pull in the labels here to add the labels the column labels to each each bar so I'm going to hit ctrl on my PC command on a Mac and then left-click hold those together and then drag it over to the labels box and you can see that it adds the labels there I'm going to do the same thing now but for the bottom section and the percent of total control left click hold and then move it down to the labels section you can see now on top we have the volumes and then the percentage of the total number of records below so this is great but what it really doesn't give us is whether you know we want to know if age was an influencing variable on whether or not the passenger survived so we're going to use the survived variable here and it's a measure tab loge treated that as a measure when it pulled it in to the data set but we really want to treat that as a categorical variable so we want to convert it from a measure to a dimension and the way you do that is you hover over the variable and click down here and then convert to dimension so now that's a dimension it moves up to four measures up to the dimension section and we want to color these bars based on whether the passengers survived or not so I'm going to hold down control and then left-click on my PC command left-click on the Mac in order to drag this field over but I want to do I want to apply that color to both sections so I'm going to click on the all marks box here and then click on it control left click hold and then drag that over to color and then you can see this legend pops up over here on the right and what that represents is if it's orange then that represents a survivor the survivor portion of the population and blue is the non survivor population so you can see what the distribution looks like here in terms of the total number of records by age bin and then you can compare you know each ratio the percentage each age bucket to see if that age grouping had a disproportionate survival rate relative to all the passengers so this is great we're gonna call this I'm going to rename this tab so I just double clicked here on the sheet and we'll call it age okay and now the good part we've already built this out so we don't have to redo this for every variable so I'm gonna right click here and then duplicate and so what I'm gonna do now is just swap out age for another variable so in this case we're gonna use passenger class so I want to get rid of age bin so I what I did to get rid of that is I just clicked on it left clicked no ctrl or command button just and then just dragged it below and that get you know that removed it from the filter and I can do the same thing here just drag it away and here we're just stuck with you know left with these summations of each each category here so now I want to add passenger class this also was treated as a measure so we want to convert that to a dimension drag that over we only have three classes here I can drag this over this chart to expand it so it makes more sense we do not have any null values here which is great and you can see that it automatically populates everything so that's that's great we can change the name here to peak class to represent passenger class duplicate that and parents of children let's look at that number parents children aboard the Titanic number of siblings spouses aboard the Titanic we're gonna look at sibling spouse's will convert that to a dimension swap that out there we go rename this sibling spouse and I'm gonna create one more chart one more sheet and it's basically going to be whether or not they survive just to get a feel for you know out of all the passengers how many survived so I'm going to pull over the survived variable here and then number of Records and this does not look very nice right so we can actually change the chart type over here if you click on show me over here in the top right corner I'm going to actually just create a pie chart I feel like that makes a lot of sense for this type of information you know it's going to be one or the other they survived or they didn't survive and I can drag that out to make it a little bit bigger and when you click on the pie chart option here it automatically adds these labels and what I can do in order to add the labels to the visual itself is control left click and drag that up to the label command left click if you're on a Mac and then you can add okay so that worked now I want to add survived the one in the zero so one represents a survivor zero represents a non survivor will name that sheet survived okay so now we have four sheets we have aged passenger class sibling spouse and survived and that's great we can actually use this information to create a dashboard in tableau and I'm going to get that to that in a second but what I want to cover very quickly is how to create your own measure okay so I'm going to use age for this I'm going to instead of using the bins create a calculated field we'll call it custom age bins we're going to say if age it's less than or equal to 10 then 0 to 10 else if age greater than 10 and age less than or equal to let's just say 20 and 11 to 20 else if you get the idea I'm not gonna do this for every age bucket will just want everybody else in over over 20 we'll just put them in 20 plus so they're 20 years plus and then if I add the end there I've got elsif in an else so I mean I'd and remove else if instead I have an Ellis which will be the catch-all at the end if it isn't fall within one of these buckets and then the end of the statement apply and now I'm going to duplicate the age tab for this example and swap out age bin for my custom age group custom age bin so now you can see it's 0 through 10 11 through 20 20 plus and you know we can use the bin function to do that really quickly I think the point of that exercise was to show you how to create a calculated field so I really don't need that anymore so I'm going to delete it so now we want to take these sheets that we've created and we want to make a dashboard okay so to do that you can see these three tabs here below with the little chart there and then the four boxes and then what's called a story we want to click on the dashboard by the new dashboard button okay so what this does is creates a blank canvas for us to create a dashboard and this thing always for whatever reason the size is always wrong so I just change it you click on this little down arrow here and then the other one right below it and choose automatic and it expands it to fit your screen and then we just double click each of these sheets it's age class sibling survived okay that was really fast really easy to kind of pull this stuff together and add it to a dashboard and I don't like this number of Records thing every I can click on it and check you know if the X and it will remove it and let's say I want to have survived up here on the top section I can move I can move that around you can drag and drop these sections to fit your needs there we go let's say I want to expand this some you can actually just hover your mouse out a little ways and drag that so that it stands out a little bit more and you can change the colors here so I can change the survived or non survived I'm going to change that to green director you know I'm sorry I changed the one to green to represent that they survived and then the zero to red I could do that here and it will change all of them there you go it's a nice Christmas colors but we can drag this over as well make it fit more symmetrical and then increase the size of this as well and reposition it so we just in a matter of minutes we're able to create sheets and then a dashboard and that's really fantastic I don't all these other tools that you could use like clique or d3 or I I don't know it takes a while it's it you got to know how to code in those software programs in order to make the visualizations and it you know it's just a pain so the thing I like about tableau it's it's it's really easy I compare it to two like an iPhone where people someone a child can basically pick up an iPhone and immediately start using it it's that intuitive you know tableau is that way for visualization it's really that simple you just literally drag and drop until you get what you want so I really actually don't like these colors especially the red maybe we'll just add the blue bag okay that's a little bit better anyway we only dealt with a few of these variables here you can get a feel for you know how many people survived you know it looks like and we could actually add the percentages here as well but it looks like you know most people did not survive so it's really sad and you can look at at this chart here and say you know out of the people that did survive it looks like you know if they're between the ages of 20 and 30 you had a better success rate there and for the passenger classes it looks like if you were you know in the first class you had a better shot you know fifteen percent versus you know less than ten percent here and then 13 percent here and siblings and spouses those that had I guess fewer siblings and spouses they they survived at a higher rate which is obviously pretty sad but this this really kind of brings to life the data we we all know what happened with the Titanic and if we were to just look at this data here in the data source you know just looking at this you know there's only 891 rows you might be able to even look at this and sort of pick up some trends but what if that number was 8 million rows right there's no way you could could go through that data and honestly there's no way you could get this data into Excel I know they have excels gotten better but just the expedience with which you can pull data in and look at it I don't not sure that you could do that with any other software that's available and then quickly create these sheets and dashboards so and thanks for watching [Music]
Info
Channel: freeCodeCamp.org
Views: 439,737
Rating: 4.9249315 out of 5
Keywords: tableau training for beginners, tableau tutorial, tableau dashboard, business intelligence tools, tableau training, data visualization tools, tableau desktop, tableau excel, tableau certification, tableau online, what is tableau, data vizualisation software, tableau reporting, business intelligence and analytics, tableau dashboard example, tableau example, tableau vizualisation, tableau course
Id: TPMlZxRRaBQ
Channel Id: undefined
Length: 28min 42sec (1722 seconds)
Published: Tue Jan 29 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.