Visualization and Graphics in JMP (11-04-2016)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well thanks everyone for joining us for this webinar visualization and graphics and jump this is a really fun webinar to give especially because as many of you probably know if your jump users jump is a wonderful environment to visualize and graph data both for the purposes of exploratory visualization so when you're really first getting to know your data and also as an explanatory tool so when you need to communicate findings in your data to others and so what we're going to cover today is extensively a graph builder really the main graphing platform within jump and if you've never seen jump before I'm just going to pull up in a sample data set Hollywood movies just to make sure everyone on the webinar even if you've never seen jump before can follow along jump is statistical software it works a little bit differently than most other statistical software but it's very friendly intuitive and quick to learn and one of the main things to know about jump is that jump wants to know what the measurements you have in your data set actually mean so for instance in this Hollywood movies data set I have something like rotten tomatoes scores so these are numeric values and something like lead studio name these are categories and you'll notice in jump there's a section here for the columns list and these actually have a different icon if I click on the icons you can see this will reveal the modeling types available for a column so the column of categories can only be marked as nominal or ordinal or even none if I don't want to model it at all but that says to jump is that the values in this column are measured on a nominal scale their categories and we should treat them as categories something like rotten tomatoes score has that blue triangle and that's a continuous modeling type which says to jump that the values in that column should be treated as numbers now the reason I'm telling you this if you've never seen jump before is that these distinctions actually matter to jump when you produce analyses for instance if I produce a distribution output I'll just grab both those two columns I was talking about the output I get in jump is dependent on what categories are what types of modeling types i specified for the columns I entered so for instance lead studio name gets me frequencies and frequency distribution plot whereas the continuous marked column run Tomatoes scores gets me a histogram and actual summary statistics and quantiles so jump will contextualise output based on the type of data you put in now beyond that there's not too much you need to know about jump for today's webinar if you've never seen jump before and what we're going to do is play around with a platform called graph builder which is really the main place and jump to produce composed graphics now if you never seen jump before you may have noticed that even when I went to an analysis platform jump gave me a graphical output and this is something you'll see a cross jump really no matter where you go and jump there's always a visual to accompany a statistic that you get but what we're going to focus on mostly today is graph builder the main place and jump to produce these graphics and what I'm going to start with is really some graph builder basics so if you've never used graph builder before I want to get you up to speed with how it works and make sure you leave today's webinar knowing how to produce the most common types of graphics that you'll be producing now i'm using a jump journal on the left and i'll send you this jump journal as well as the recordings to today's webinar after the webinar is over so you can sit back and just watch if you don't have jump open you don't need to do this as i'm doing it the data links to the data files i'm using here these actually go to sample data and so when you use the journal you can just click these links like i will or you can go to the help menu and go to the sample data directory so all the data i'm using today is available with jump when it ships all right so let's get started here so graph builder is available under the graph menu and it's the first option there graph builder now graph builder is really a drag-and-drop graphing environment and what that means is that we simply have a list of our columns on the left and we're going to move those columns will just drag them over to the roles that create the type of graphic we want to make a graph builder has some other things that has a ribbon on the top I'll talk about these are the different graphing elements we can put into the graph it has a graph control panel on the left hand side this is a place where you control certain characteristics of the element and then there's red triangles that do additional thing so the red triangle for the graph builder and overall menu oh this has specifications like legend position and settings setting color themes and doing other things if you're new to jump click every red triangle you see they always have additional options that lets you do different things all right first let's start with drop zones so the drop zones are really broken apart into drop zones for data that is the x and the y and then drop zones that in some way split up the plot that you get so group for x group for why the wrapping an overlay role and then a color and sizing role that really embellished the graph and the points with additional information and so let's try out just a simple graphic here so remember these are Hollywood movies data and so I have information on the rotten tomatoes score for each movie and the audience score so maybe I'm interested in how these relate to each other you know they may not be exactly the same an audience may review movie differently than rotten tomatoes so to employ these into different roles i'm simply going to drag rotten tomatoes to the X notice as I hover over a roll jump will actually plot what the data will look like or what the graph will look like before I even drop it there so I'm just going to drop that and then I'm gonna take audience score and my drag that to the Y and if that jump will now produce my graph with a scatter plot now you notice jump also added something to the graph an additional element this is the default when we have two continuous variables this is called a smoother so this is sort of like a moving average or a smooth line and I have a control on the left in that control panel lets me change the stiffness so to the far right with lambda is a straight line this is the linear regression of y into X and going to the left it allows a little more flexibility so looking for local changes in the line so I'll just leave that where it is so notice that i use two roles here the x and the y data roles now we have additional roles that we can use within this graphic and so let's look at a couple of these the group X for y and the rapping roles now what these roles do is break up whatever visualization I've created in the center into the levels of whatever variable I put into each of those roles so for instance I have something called FEMA in my data set let me actually go back to the data set and show you theme here is categorical it's what type of theme is actually the movie and i also have genre something else categorical what genre the movie is so let's try to employ these in this graph and i'm just going to drag this graph bigger because we're going to have a lot of levels of genre and what I'm going to do I'm going to drag genre over a couple rolls I'm not going to drop it I'm actually just going to move it between them to show you what these different roles do so let me hovered over group for X and notice what happens immediately is that the x axis is split up across the different levels of genre now when we have this many levels the graph doesn't really speak to me very much of course graphics are made for us to communicate something valuable about the data and when we have the axis so truncated there it's sort of a little bit hard to see what's happening similarly if I go to group for why we've made the y-axis split up so narrowly that we don't really get a sense of the meaning within the scatterplot and so for lots of levels the role I like is actually the wrapping role and so this creates a trellis plot and so what we have is for each of the different genres the graph of rotten tomatoes score against audience score but separate it out for the different genres is this is a nice type of graphic if you're trying to show especially with many levels of that categorical variable that variable you're rapping on the basis of and you actually want to keep the the square graphs that's a nice way to do it I'm going to click undo because I want to show you what happens if we use a different role so we use the group X Y and wrapping what happens if we use overlay and so overlays this role right here and overlay works a little bit differently than the group X wrapping and why because what overlay seeks to do is plot the same visual that I have in the center but for the different levels of Nash onra so let me drag that into overlay and notice what happens is whatever elements I had already specified and so I had specified a smoother and the points all of those points and sweeter elements are now duplicated across the different levels of genre and this sometimes works very effectively so in the case of a scatter plot we're looking at really how coherent the relationship is between rotten tomatoes score and ions score for the different genres and we do have a number of levels where we can see there's a general coherence here regardless of the genre the relationship between those variables seems pretty consistent and again taking advantage of jumps interactivity here is very helpful when I click on the different labels for the different genres I can actually pull out those individual traces and so we can actually pull them out and see which ones were we're dealing with and this type of interactivity actually will extend to want to export this graph to interactive HTML something I'm going to talk about later when you actually want to create a dynamic visual that you want to put on a website so that's what happens when we use the overlay role now I'm going to take this out and let's consider two additional roles so not overlaying or grouping but when we want to embellish points by either changing their colors or changing their sizes and so that's the color and size role and these really refer to most commonly when you're working with points although you can also color bars and sometimes you can size special elements like shapes within a map so let's try color here and I'm actually going to take a onra again and let's put that as the color and I want you to see something about this that I think makes a good point for how and when color should really be used notice that we can use color as the different colors here for for action all the way through thriller but there's not a very it's not a very big pop out for me I can definitely identify which the points are by clicking on them but with so many levels of genre the meaning sort of gets lost there for me so I like to use coloring especially when I had categorical distinction things where I really want to make a point of one group versus another group that's not my favorite for that now sizing is a different idea so sizing will actually change the size of the points based on the levels of a variable so for instance let's take something like the world gross this is a continuous variable and I'm going to take this over to size and what this will do is size the points on the basis of world gross here and what I like about this type of visual especially when we're using a continuous variable here for the sizing is we can look at three dimensions of data now all at once the the relationship between the scores and really how that relates to world grows and what you may notice is well most of the points most of the the high grossing movies are actually here at the top of the scale that is there they're the ones with the largest size points but there are a few notable exceptions places where we had a low audience or low rotten tomatoes score and yet they still grossed a lot in the world so if I hover over those transformers was one of them pirates of the caribbean on stranger tides was another here's one way down here so the smurfs poorly rated that actually made a lot of money and so what i like about this is especially because it relies on some very strong pre-attentive processing we can see pop out effects very good because we're very good visually as a species we can actually pull out these these elements that do fit in with an otherwise coherent sort of pattern in the data and so this is a nice way to incorporate those additional variables now I want to make a point here we can use for sizing we can use a categorical variable I just want you to see what will happen here jump will do its best to figure out what you mean what it's actually doing is alphabetizing the genres and so the bigger points are actually the ones that have a genre seen genre eight here is actually number then based on alphabetical score that seems crazy but what if you had an A through G scale for something that actually was continuous but you had given them letters well jump wants to let you use that so it will let you use for sizing something that is categorical similarly for color jump will let you use something continuous so let me put in four World gross the color and notice rather than giving discrete colors what jump is going to do is give you this gradient that is the the higher world grossing films will get a hotter color based on this color scheme now we're going to come back to talking about color themes and actual gradients later I just want to show you if you right-click and select gradient here there's a lots of customizations you can apply and again if you never use jump before since jump is very modern statistical software you can right-click everything and so there's always additional options let me tell you my favorite thing to do when I have something like world gross which is on a big scale my favorite is actually to change the scale type something like quantile so it's going to scale them within the the quantiles here and actually something that I find very useful is I like giving them discrete colors and I'm actually going to say give me only three label points so I want to do a low versus high so zero fifty percent one hundred percent and so I can actually look at the top fifty percent versus the lower fifty percent and so again my point with color we're not very good with five or more colors as far as pre attentively processing where they are two is amazing we can pick out patterns between two patterns or two colors very very quickly and without much effort so whenever possible if that makes sense for your data it's a very nice thing to do and so we'll come back to actually working with with the gradient tools a little bit later alright so that was a click done here and hold this graph those were the main drop zones but there's a special drop zone I'm enough to employ a different data for just to show you and that's the map shape drop zone and this is something that relies on and jump if you have data that actually have states or counties or geographic regions something that identifies a region so that you can plot data as a heat map on those regions let me show you what I mean so these are u.s. demographics that we have in the sample data I'll go back to graph builder and get myself a new palette here and all I have to do is use this map shape section and so what I'm going to do is drag state into the map shape section and I get a little canvas here of the United States now what jump is asking me for is now a quantitative variable or categorical variable to color the states by and so let's pick something like how many smokers there are an individual state and so I'm going to hover over the center and what you may notice is that when I drop it in the center jumps going to make a guess about where that that variable should go and what it's done is ported it over into the color roll for me and so just like I did before I'm actually going to get a little gradient here that's grading the states are showing the colors for the states on on that variable and so the map shape roll is actually quite nice we can produce maps of a number of different styles here and so I'm going to come back to showing geographic mapping a little bit later it's not the only way we can make maps we can actually also use latitudes and longitudes if we have them available in our data set so mapping is also something that jump does very easily and very quickly now there are two roles or a few roles actually that I I've going to skip over here the frequency and the paging role and so you'll notice them down here the frequency and page we don't really need to talk about those today frequency is there for pre summarized data so when you want to bring back to the original data if you have frequency counts or waiting variables and paging sort of like the group X group y and rapping is a way of showing the same plot across different sections but it actually produces them across different pages and so we're not going to employ that much today but just so you can see what it is I'll make a quick graph here and I'll take something like region and I'm going to drop it into the page section and notice what page will do is instead of showing them in the same plot I actually get multiple plots all for the different regions so this might be very useful for you especially when you want to make plots of the exact same type but for maybe across different regions of your company or for across different classrooms or whatever is that you're trying to graph and so not the same graph but across different graphs and so sometimes very useful so we want to talk about those too much today that's the basic idea of graph builder it's to be very quickly allow you to create your graphic simply by dragging and dropping columns across we made a number of grafts quickly just to show the drop zones but what we're going to do in a second is step through a little more concretely how to use all these different elements how to combine them in certain ways and how to make the type of visual you're looking to make but before I go any further I want to say something really important here which is how to show and hide the control panel that I have on the left because if you're going to be showing these data to somebody you don't want to show them the control panel with your columns with your points with all the roles you haven't used you actually want to complete the graphic to make it in more of a publication quality forum I've actually been to conferences where I see people with their jump graphics showing the group X and the rap roles you know showing the extra rolls there and so don't be that person click the done button to hide the control panel to hide everything else and get your more or less finalized graph you can still make customizations here for instance maybe you want to move the the legend to be inside the graph so I can go to the red triangle and put that inside the graph or something like that you can still make some changes but we've more or less finished defining the roles in the data set and more or less finished controlling what those elements are doing and so that's what hiding the control panel does to get it back go to the red triangle and click on the control panel again and so notice you're not you know finished with it you can always come back to the control panel but before you save a file out click that done button because that's going to actually close all the additional elements and get you that more or less finalized graphic okay so just a point about showing and hiding controls so talk about some graph types here and so we've we've looked at how to use the drop zones but let's talk about the different elements that are available within different graphics and so this little diagram here this is actually the ribbon that's at the top of my my data set here and so these different elements are things we can employ in different situations depending again on the type of data that we're putting in to jump and I say that pointedly because jump is going to give you the best options based on the modeling types that are available and so when you bring in data just make sure those modeling types are set correctly again if you're brand new to jump it's as simple as simple as clicking on the icon in the columns list and setting the modeling type for the most part jump will get a right as soon as you import data if there are numbers it'll say it's continuous if there are characters it says its nominal but occasionally I see data such as gender that's categorized as 1 2 and 3 so 1 and 2 for male or 34 did not respond well those aren't really numbers jump will think they are unless you tell it differently and so it's good to make sure to go in and set those to be nominal so be aware the jump is going to pay attention when it produces these graphics now you may have noticed when I drag variables in jump will often pre assign a particular element and so when I had two continuous variables the pre assigned elements are the points and the actual little smoother line here to change an element I'm just going to click on some just to show you what happens I'll click on the line of fit and there's a jump will change out the smoother but keep the points that's actually what in this little visual notice the points are still on so jump is telling you when i switch to that that point stay there if i click to the ellipse jump says okay well you still want the points but we're going to show the density ellipse but notice that for the rest of these there's no points shown so when i click on the heat map here the contour it doesn't actually continue to show the points we have control options of course i can say give me more levels from the contour that cleans things up for me a little bit but let's say i wanted to get the points back and so we're talking about adding and removing you've already seen how to add how about if we wanted to layer an element so i want not just the contour but also the points to be on here it's the way to do that is and there's two ways either drag the points on top and knows what that does is it appends the points to the graphic or hold down the shift key and click on the points and then jumps mine that says hey you don't want to remove whatever element you just added you actually just want to add this new element on top of it and you can do this multiple times i'm going to hold down the shift key again and click on the contour and let me just dragged into some there as well to show you the other option so we can add many elements and we're actually able to control these elements in specific ways to make more complicated visuals and so these are things that we can employ one at a time or multiple at a time if that's what we want I'll just make a point as I go further Drupal always lets you plot things that maybe even don't make any sense and so I can click on little box plots here notice jump doesn't even know what to do with it we have two continuous variables I can turn on little bars and so jump will draw bars instead of points for each of the data points that has and there's some that just don't make any sense at all if I click on a pie chart you know what is jump trying to do with that well it's the frequency of observations for one of these variables and so be aware jump will let you do things that maybe don't make sense so it's incumbent on you a little bit to think about what you're trying to plot here to make sensible graphs alright so those are the element controls let's talk about some basic examples of plotting data and continuous by continuous is really the first thing that we were talking about the one that really a lot of graphs start with making the scatterplot there and again the scatter plots that are easy as dragging two variables in that are continuous and jump will add them to your axes you saw before the way I was doing it was dragging one to one axis and one to the other it's nice that you can simply drag both in at once and jump we'll just assign them to the two axes and so that's the basic scatter plot and scatter plots as you saw we can embellish with multiple types of elements if we want the line of fit what that will give us is the regression of y on 2x we'll talk a little bit more about controls later on but you'll notice on the left hand side we have controls for what we want to show so this is showing the working hotel in confidence band maybe we don't want the confidence band for the fit we actually just want the line itself or we want a different type of confidence band a confidence band for predictions show showing us where we predict new observations should be within some proportion of the time so ninety-five percent is the standard here we can also add on certain statistics grab builders not really an analysis platform but maybe you still want to show the root-mean-square dare or your R squared or the even the equation of the line when you're graphing a line of fit and so we can add in certain elements here that make sense now there are some other visuals that really do make sense you have lots of data points and so I want to show you one example that I find useful and actually the last time I gave this webinar somebody asked a really important question what do i do and i have so much data that it's hard to see where the points are that is there just a cloud covering themselves and so I pull up in some semiconductor data here and so we have many processes we measured there's only a 1400 the problem i'm going to show you gets even worse when you have you know millions or billions of observations but let me go to graph builder let me show you what i mean i'm going to drag in sort of two of these these variables and it's not terribly important what they are right now we're looking at really wafer problems but notice the smooth ER comes on first I'm going to turn that off we get a sense that there is a big cloud in the center that is most of the data points are really clustering around the center and we see fewer out into the extremes but it's not entirely clear how different the density is right here where my mouse is versus the deads the dead center it's a one graph that I like and something you may find value in is this heat map and so heat maps are very useful for categorical by categorical variable we're going to see a number of them in a minute but let me show you what happens when I do it for continuous by continuous data which I'm going to do is now show me that account based on the locations in x and y and so what we're seeing is the hottest density right as the dead center and we see how the value sort of fall out or the counts go down as we move further away from the center let me show you something i really like i'm going to double-click the axis and we'll talk about access customizations as well later on but we can change the increment here that is we don't need the increment to be just 50 i'm going to put 10 minor increments and what that means is that for the the widths of those little heats that is showing the points here we can actually show a little bit more granularity and so minor tix i'm going to say let's say five for this axis and so maybe especially when we have lots and lots and lots of data we need this sort of granular heat map to show us really the differences in densities and so we can see some hot spots here that we never would have been able to see before and so there's one actually right here and maybe a few that are far apart from center certainly the dead center is most but we're able to pick up on some extra information here that is otherwise hidden when we're just looking at the point that doesn't really give us the sense and so explore that heat map especially when you have very large amounts of data and you're trying to show them in this sort of way all right so let's move on from scatter plots folks I want to show you some categorical by continuous and continuous by categorical it turns out these are really the same and jumps mind one continuous and one categorical variable really means the same and a categorical by categorical sort of plot so when we have two things that are really just categories so start by continuous by categorical and to do this let me go back to my my Hollywood movies data set and I'm going to go to graph graph builder pull up a new canvas for myself here and let's look at just one continuous variable let's say the rotten tomatoes score I'm going to drag that to the y axis and let's say I'm interested in how the rotten tomatoes scorers change or are different across the different genres and so let me take your honor and drag it to the x-axis now before I click anything let's look at what jump did it's still showing me points and I actually quite like this view it's showing me the points within each of the genres and we can see for some of the genres we only have one or maybe even two points for our purposes I'm actually going to grab these I'm hold down the shift key let's actually right click and I'm going to go to grog let's see rows here i'm going to do row hide and exclude so actually let's take those out of the graph for now so i want some some really rose with lots of data or columns with lots of data so if we're looking across here we're looking just at the different genres and the points within them and this gives a great sense of the distribution of the variables and the reason i like this view is because when we switch to something like a bar chart which is what we're going to talk most about notice that hides in a sense the the variability within a category and we can always turn on things like air bars until air bars standard air and the range and standard deviation I'll turn on standard air here you know that gives a sense of the variability within a particular category but the points is something I really like I think that really shows the data so i encourage you to use a view like this when you can to really show the observations you're not showing the center of course so if showing the mean is what's valuable certainly use something like a bar but if you need to show the variability within a particular category this is a great view let's switch to bars because let's talk about the bar and column chart options here and turn off my standard air bars now what I'm showing here is what we traditionally call a column chart as a column charts have the categories on the x-axis and the continuous variable on the y now if we wanted to switch the roles that is genre now to be on the y-axis and ron tomato score on the x i want to show you a really neat trick and jump something that's going to be incredibly valuable for you i think when you're trying out different visuals and simply right-click on a variable you'll see there's a swap section and what we can do is we can swap that variable with another variable we're currently using and so i'll say swap with genre and what job is going to do now is show us the genres along the y axis and the rotten tomatoes score along the x and one thing I'll say about this is this becomes a very useful way of plotting data especially when you have lots of different genres so when you have let's say 50 rather than the number we have here you'll actually get a lot more value showing them in this way because simply we don't have to rotate the tick labels it's a lot easier to show variables with lots of levels when those levels are occurring on the y-axis that's a really handy thing to do now another way we can show these data instead of just using the mean or whatever summary statistic we want to specify on the left instead of just showing them as bars another view I quite like are the box plots and because box plots just like showing the raw points are giving a sense of the distribution of the data now of course box plots are not showing the same thing of the bars the center of the box plot in each of these is the median not the mean and we're looking at the 25th and 75th quartiles than the outer fences and so these are showing a sense of the distribution of the data but also some sense of the middles or the median in this case so it's a very nice view especially when you want to convey you know not just center but also a little bit about the spread within the data and so those are our very nice views now I want to show you another one that you may not have seen before and these are called violin and contour plots and to show you these I'm gonna switch over to another data set the one that I always like to show with this which is Fisher's IRS data so these are measurements of different species of iris is on different sort of measurements within the flower that the sea poles and the petals and so let's go back to graph builder and I'm going to put species here on my x-axis so we just look at the different species let's put something like sepal length in as my my-y now we already saw we can do a bar chart to show the the mean we can do the Box wats to show the median plus a little bit information about the the distribution of the variables but let's try this one the contour you saw me turn this on for a scatterplot before but i want you to see what it does when you turn it on for a categorical predicting something continuous now what this is showing is sort of just like the points and just like the box plots something about the distribution of the data just to show you that let's let's drag on the points and notice what it's doing is the place where there's the most points it's having the fattest part of the contour the place where there's the least points as the thinnest part and so what you get is really a folded distribution you're looking at the distribution like you would if you were looking at little histograms almost so you're looking at those but in a contour and what's great about this type of plot is it's a nice very low work or low effort type of plot to see where where the values are especially when you have lots and lots of data where you wouldn't want to show just a cloud of points the contours show you something that that's very useful and we'll come back to layering visuals with the contour to make some specialized graphics with it and so I like those we also call these violin plots alright so that's continuous by categorical what about when we have categorical by categorical data so something we haven't really looked at yet so to do this I'm going to pull up in a different data set let me minimize these ones I'm gonna pull open my consumer preferences data set and I'm also going to pull open a different one we haven't seen yet either San Francisco crime and so let me start with heat maps and so heat maps which we made for the continuous by continuous data before are really useful when we're looking at categorical against categorical and so for this data set these are nine thousand or so incidents of crime in San Francisco and suppose then my question was how do the the crimes really distribute themselves across different days of week and also across the different police districts so we have different districts that are in our data set so how you know how often are their crimes in each district but also across the different days of week so look at a graph builder and let's see how we're going to do this so Dave week let's pick an axis I'm going to put it on my y-axis here notice what jump does first is just shows us the count or the number of observations so that gives us some information Monday has the highest amount of crime apparently or at least crime reported let's put police district on the x-axis so I'm going to drag it here now jumps can do something kind of funny it has to choose which variable you're trying to show it doesn't know how to do that and so it's saying okay well i guess i'm going to show the frequency across the different districts but as soon as you click on something like a heat map the data become clear and now what is showing us and let's take a second to look at this we have again representing the count or the the observations right so where do we get observations but now broken up into these cells so the different days a week and the different police districts and so we're or can obviously see the southern region has the most frequency of crime reported and something like Park has the least we're at the bluest but we can also see hot spots across different days you know there's some days on Friday that the mission has a lot of crime in Monday's and even Sunday's mission has a lot of crime and so with the heat map and this is a great way of showing this we're able to really look at both dimensions and get a quick pop out effect for where there are our incidents now like any kind of a graph we have we can also layer by different categorical variables or continuous variables so if we wanted to employ some of these different roles we could so for instance what if we want to break this up by whether it was a traffic incident or not so let me put that in my group for X and you can see the vast majority of these are not traffic incidences so there are some traffic incidences but from the most part even though reported data here they're not all right so that's one way of doing this so let me click done and I want to show you something else you could do with heat map so I'm gonna go back to graph builder this time let's try something different i'm going to do a stew time of day on the x-axis and so that's the the median time things are reported by day of week it's nice still get box plots now remember we can use by heat maps when we actually have continuous variables you saw me do that for the wafers and so when I click on a heat map now look what jump does is it actually is breaking off the time into different sections it shows three our bins remember we can always change that i'll double click on the axis and let's actually do one minor tick to get it down to one and a half our bins so we're looking at the time of day by day of week and now we get another type of graph not just which district and what really day of week we're on but actually what time of day these incidents are most most prolific and so it looks like you know certainly mondays there's some pop out effects here and friday so fridays from 6 to 730 there's certainly a lot of them and the majority being in the the southern district so i think this is a need view and something to remember that you can use those heat map elements when you're actually working with a continuous variable something i think is pretty neat okay so let's look at a different type of categorical by categorical some mosaic plots and mosaic plots I really like this is actually the default type of plot you get when you go to fit Y by X and you employ two categorical variables so let's go to graph builder and in the consumer preferences data these are four hundred or so observations from people i measured on a number of things so some basic things like birth years whether they're single or not their age group and salary and then some some utterly random things like how often do they floss how often do they brush before dinner how much does their toothpaste cost you know so I worry about the type of people who responded this kind of random survey so let's look at these data and let's ask a question of these data that involves two categorical variables so perhaps we're wondering about age group as some kind of predictor here so age group in the data set let me drag it to the X and you'll see these are grouped bins of different ages so 25 to 29 30 to 34 etc then we have a catch-all group the over 50 for individuals and so what if we're we're wondering how do some of these other variables relate to the age group so for instance let's say job satisfaction so let me drag job satisfaction to the y you'll see what jump immediately does it shows us the points again and again the shows sort of the in a very quick way where do you have observations so very few people are not at all satisfied between 35 and 39 in this data set but let's use a sort of a more visual way of showing this and this is the the mosaic over here and what I love about the mosaic is that it shows a lot of information about the data all at once and so let me orient you first to to what's happening on the x-axis because if you look the x-axis is not spaced equally for the different age groups that is the 25 to 29 section is wider now the reason is wider is we had more people in the data set over 25 to 29 and we have the fewest people in the data set who were 45 to 49 and so the width of the columns here on the x-axis is showing them so that's a representation of frequency there but what's happening inside each column is really the magic of the mosaic plot what it's showing for us is the conditional probabilities of the proportions within each group so for instance let's look just at the 25 to 29 year olds so the not at all satisfied are the lower section and so if you read off the probability here let me actually double click the axis and let's set the increment to something smaller I'll do every five percent here and so if we look at let me do actually ten percent that'll be good there we go so if we look at this the 25 to 29 year olds so it's a small proportion about 5% of them 5.3 exactly if we hover who X you're saying they're not at all satisfied and then there's a group in the center from that five percent all the way up to the 65 so that comes out to be about sixty percent that are somewhat satisfied and then the top section are the extremely satisfied and so it's neat is that you can see the proportion within each group that is groups along the x-axis that are answering in that way and what you're basically seeing here if I were to get this plot is there's not much difference across the different ages in terms of their job satisfaction they're relatively flat each of the shares within each of the sections are pretty equivalent let's actually take another variable just to give you a contrast what would it look like if there was a big difference across age for whatever the Y variable we're specifying is well what about people whether they're working on their career or not so people who are younger are maturely answering that they agree they're working on their rear but as we get to the older individuals fewer of them are saying they're really working on their career although almost half are still saying that they're working on their career who are over 54 but notice that with great about the mosaic plot and this is true across all visualizations if you can make a plot that as soon as somebody sees it the meaning within the plot is obvious then you've made a good graph if you make something that's very difficult to understand and is too embellished or too complicated where you have to explain it for you no more than a couple minutes there's almost certainly a better way to display whatever you're displaying and you should work on that graphic because graph should be obvious as quick as possible the point of a graph is to really convey information not to impress people so make the the simplest graphics that conveys your story all right so that's a mosaic plot a really neat type of visual i think and one that it especially you have categorical by categorical data i should certainly be turning to all right so the basic examples let's talk about employing additional variables within your within your graphics and we've done a lot of this just as we're playing around but let's take a little more formal look into this the first one we talked about it is overlay and so overlay we saw when we had continuous by continuous that was something we did with the first data set but let me show you overlay when we're actually working with bars or columns and so i'll go back to consumer preferences and so let's actually look at maybe the different age groups again so that's a categorical variable and let's look at their salaries and so how much money are these people making in the different age groups i'm going to try on the bars and notice what jump did is to pick the summary statistic for us and so that's true whenever you plot bars or something that has to summarize the data we have to choose a summary statistic on so in this case it shows for us the mean and so what if we use the overlay role and what that's going to do for us just like it did before it breaks up whatever visual we have in the middle across the levels of something else and so let's take a variable that would do that what about the people who are working on their career versus the people who aren't working on their career so let's go to overlay and notice what happens is jump will break up the visual that is break up the data in the bars to show those people who agreed with that statement and the those who disagree with that statement and what we're generally seen just to interpret the graph quickly is the people who are working on their careers the agrees they're typically making more money than the people who say they're not working on their career although we have one sort of counter example to this the 52 54 year olds for whatever reason the people who aren't working on their career seem to be making more money but that quickly changes for the 54 and over so who knows what's happening there but notice that we have additional options for how to control this so the bar style on the left hand side is important what we're plotting right now is a side by side bar now another way you'll often see multiple categories presented are stacked bars and so stacked bar charts actually put the bars and sum them together now this is a little bit weird for these data this is now showing the total mean between those two different groups which maybe doesn't make a lot of sense but one thing I'll say about stacked bar charts is just to caution you our ability to resolve the interior elements diminishes as the number of levels increases for overlay and so what I mean by this instead of using let's say a two categorical variable what if we take something like employee tenure and so let me take that over here so as soon as we have many levels of the interior components it becomes really difficult perceptually I mean we can do it obviously but it takes work on our part to resolve the interior components for instance if I ask you about the people who have more than 20 years of employee tenure how their salaries are changing on average you can tell there's one big group here that is they're making a lot more on average but it started it gets difficult to resolve the differences here whereas if we had these aside by side it becomes really easy to see that trajectory simply because you're going to compare across them and so I just be weary of the the stacked version but let me undo a couple times to get back to the the previous variable I had here there we go I am working on my career so that's what Stach does I want to show you some of these other ones because some are actually really quite valuable I remember side by side as the default and so if we were to go to bullet I want to show you what this does this actually moves one of the categories inside the other category as this is actually nice as this will show us when you have a control it's actually nice to show it this way the control sort of in the background and you're looking at how one differs from the other nested works the same way nested puts one inside the other this works with more than a couple levels and some of these aren't really appropriate for this case so range will show the difference between the agrees and disagrees same thing with interval shows the difference between the two these are often used for stocks or things like that single is kind of neat single shows the second category as a line above or below the the original bar stocks again show sort of like a range the box plot doesn't really make sense in this one needles very thin lines sorry thin points and float don't show bars at all really but show the the lines for each and so these are often used when we're in belching graphs with additional plots and so I'll show you a couple those in a minute let's go back to side by side because I want to make a point about this so when you're doing side by side plots make sure you're drawing the right comparison that you want to draw for instance the way we've plotted these data we're making it very easy to compare how much people make when they agree and don't agree with working on their careers that is because the bars are right next to each other for that it's very simple to make that comparison perceptually we get a pop out really we see that the blue bars are almost always above the red but what if we were to switch these variables and remember we can right-click a variable and go to swap I want you to see that if we swatch it with swap it with age group that distinction that comparison becomes much more difficult so if I asked you well what's the effective of age or sorry of working on their career versus not we really make it we can sort of say the whole mass over here tends to be a little higher than this one at least I can sort of see that knowing what I already know about these data but that comparison is almost completely lost we're very good at looking at the age trajectory within each of these that is people who are getting older or are older seem to be making more money but the direct comparison between working on their career yes or no isn't really obvious so try different visuals you know I'm going to swap them back and notice that that really does convey that story and so with graph builder is very simple to change these rounds to try different visuals is what I would and notice that we can drag these variables to different roles too so we can always use them within the grouping or the group X but overlay really conveys a message i think most forcefully okay i want to tell you about two additional roles you may not have known about and that's the side by side roll and then a multiple y and multiple X roll and side by side is really neat so if I bring back a graph builder and let's again look at salary you know we were looking at age group as my x-axis I'll put back on the bars but what if we want it to not in the same plot also look at the people who are working on their careers or not on the average so not combining those two plots but just showing another plot within the same plot and so I want to show you something here there's a drop zone just to the left and just to the right of my original x-axis and notice what this is doing is this is plotting not in the same plot again not breaking up the bars but a separate plot showing salary differences for those variables or for the categories within that variable as this actually turns out to be rather useful when you want to make sort of a visual when you need to show them effective maybe multiple variables separately on some outcome variable now this works also on the y-axis now we don't need to just show salary what if we wanted to show the years that they were at their current position so notice that the drop zone above and a drop zone below salary I'll do below this time and now i get another y axis and so now have this plot i'll click done where i'm actually looking at a number of things all at once all in one plot now be careful with this you can make very complicated plots that tell no story very quickly and so maybe this is is communicating something important than the data but let's say we wrap this now by job satisfaction you know now we have something that's pretty complicated so not at all satisfied someone satisfied and extremely satisfied looking at each of these against each other there's just a lot happening here and so my recommendation when you're making visuals that are telling a complicated story is make separate visuals just like no one book is told as a single paragraph or single sentence no story of data is ever really told with a single graph unless it's a very simple one and so you know be careful making overly complicated graphics again point a visualization is to convey meaning not to impress people that you can make trellis plots all right so let me show you a different type of role that again you might not have known about the multiple wise and the multiple x's and so these are useful when you actually are plotting multiple things that are really on the same scale but you want to show them on the same plot so I'll go back to my Hollywood movies data set and imagine what I'm trying to show is those rotten tomatoes and audience scores on the same axis again so I'm going to go back to graph graph builder let's ask a slightly different question let's say how does the world gross of a movie sort of predict audience score and run tomatoes score well I'll put one on the axis first it's a world gross predicting run tomatoes score what if I wanted to add audience score in there as well so not above it which of course I can do or below it of course I can do that I want to add that as a separate line so there's a drop zone right on the inside right here that will actually add audience score as an additional plot notice then we have a line here for each of them and I points for each of them the jump is multi applauding those in the same graphic the same thing is true for the x-axis so what if I want a production budget in addition to world gross so i can add that in here as well i'm not even going to drop it there because that gets overly complicated way too quickly I but notice there are times when you really do want to show two things on the same plot this is a great instance you know what is the relation is it about the same for each of the different types of ratings now I want to show you something that can happen that's a problem often times when you're adding multiple things to the same plot like this but you have something that's on a very different scale and so what if I also wanted to add in let's say domestic gross and notice what happens so i added domestic gross that's on a hugely different scale because remember my ratings are 0 to 100 by domestic gross goes from 0 to 3 50 in terms of millions of dollars and so i'm not really able to see what's happening for that rotten tomatoes scores or the audience scores just because the scale is so diff we can't have a unified scale that conveys the same same thing very effectively now there's something important I want to show you but I caution you about using it it's this double access plotting so what if I wanted a separate axis on the right-hand side for one of these variables specifically for the domestic gross so we can right-click the list of variables here are the variables names and notice there's a move to the right section and sometimes I move to the right domestic gross and that's what I get now is I get two axes if I had done just so we can clean this up and showed bigger I get two axes one for domestic gross on the right which refers to the observations on that that green line and then I have an access unified for the run tomatoes in the audience score so jump will do this for you again this is one of these plots where it's more complicated i think then it needs to be and it can often misconstrue things and so if we're looking at world gross and we're looking at domestic gross here on this axis it just gets we have to make sure not to pay attention to this scale with these variables and it's just it often gets to the point where you can just make separate graphs and probably convey things a little bit more clearly so now you can do that though it's just simply right-clicking and you can move variables to the right and you could always move it to the to the left again so we can move it back alright so that's multiple wise and multiple x's and double plotting and so rather quickly I want to talk about some special elements we have looked at the week we can do multiple graph elements at the same time I want to show you that with sort of my favorite example again with the iris data set so if i go back to graph builder i showed you the option of making that contour plot so again that was putting species on the x-axis i put pedal length on the why i clicked on the contours and one thing i said was the contour showed the distribution really well we're not getting a good measure of center but one thing i want to show you and this involves the layering of visuals so if i drag on the points and i use something under the controls here i say under the controls for the summary statistic give me the mean and show me a standard error around that this is a visual i kind of like it's showing me measures of the center but also the distribution and so if we let's drag in a variable where these are a little bit closer and so i'll drag sepal length on top of that notice this gives me a sense of the spread of the data but also that measure of center that I think we all always want to show and so I think that's kind of a neat graphic so remember you can always layer multiple graph elements to use those visuals effectively geographic mapping I already showed you how to do that as a shape based map I do want to mention then you can also graph latitude and longitude so if we go back to San Francisco crime I'll go to graph builder if you have latitude and longitude in your data set it's as simple as dragging those two variables in and you actually get the points first and that's smoother and jump as far as its concern is it's concerned you're plotting points it knows how to treat latitude and longitude so it's actually plotting them with degrees but you get the map right click the graph go to the graph sub menu and go to background map and what this will do is show you sort of the background map options you can do the the regular things like simple earth detailed earth these are images but the one I want to show you a street map service what this does is phones out on the internet then pull this down the open street maps data let's now we actually have the the background map of San Francisco let me go to the Tools menu I'll go to the magnifier and just show you we can zoom in and so we get down to our street level detail really easily here and so it's worth playing with this if you have Geographic maps and so if anyone uses Qualtrics qualatex will grab the latitude longitude from the survey location so as best it can and so this is a great thing to use to look at where respondents are coming from let me skip over some of these we're going a little bit short on time and I want to mention under graph controls I showed you how to change statistics so you saw me do this with the points I can make the points show the mean or I can have a bar show the medium limiting variables is something I want to show you actually as part of a tip i'm going to give you there are some times when we want to make custom error bars and this comes up a lot with scientific data and so if i pull up on a data table here this actually is looking at the salaries by the different age levels again imagine we wanted to make a custom air bar based on something that was not just the standard error so not just Sigma over root n and so it's not entirely clear until you do it a couple times how you're going to make this custom error bar and graph builder let me show you a really neat that involves a number of things we've learned all at once changing statistics creating new variables something going to show you right now and also using some special elements and layering elements and graph builder and so what we're going to do is go to the data table and we're going to manually make the upper and lower limits for the air so if I grab these two variables you may not have known this but you can right click them in the table go to new formula column combine and we can do operations on these two variables and so I'm gonna do a sum so that'll do the mean plus the standard air I'm going to go right click again I'm gonna get a new formula column combined I'll do a difference that's going to be the mean minus the standard air so now I have the upper and lower limits so these columns I'm actually going to use in my plot so first I'll drag the mean to the y i'll drag the level to the x i'll put on the bars now getting the custom air bars in there remember that we can drag these to the inside of the graph that adds them to the graph now here's the trick i'm actually going to expand the variable section on the left which is actually jumps minor it jumps question what variables do you want to use for the graphic i'm going to say okay don't use those two new columns for this first bar instead i'm going to drag on a new bar and then for the new bar i'm going to say only use the custom errors so i'm going to turn off the mean and then remember i showed you a bar style that kind of looked like a standard air bar that was actually the interval and what the interval does again it shows the high to the low since I told it only use those two columns the one where I calculated the high in the low well there we go there's our custom air bar and so I've actually added in multiple elements and delimited which variables they get to use in order should make that air and so one thing i want to show you and this actually goes to the the next section here graph customizations so remember always right click things and jump you can right-click the axis to add reference lines to change gradients all that kind of thing there's one special right click right click the inside of the graph and go to customize and this is actually where you can change the layers in the graph and so these two bars the first bar was the background bar the second bar was the the air bar the ordering here is the order in which they're drawn so the last air bar is drawn over the original bar what if I just wanted to show the top of the bar well that just means moving the air bar up and so what I now get is just the top of the bark is the other bar is hidden behind it and so notice that you can customize lots of things and jump to get the specific elements you want so we're gonna be very close to the end oh yeah Mia yeah that's a that's a great place to get to right now so I'll stop everything we're doing now and let's talk about those quickly so certainly you know close the control panels before you save anything but on the Mac it's as simple as file export and what that will do is show you the export options if you're doing images my recommendation is the TIF or the scalar vector graphic those are nice ways to save it out on the pc you simply do file save as now another option i like is actually using the tools the selection tool and what this does is i can actually just copy out the elements that I want all should grab the whole thing here and if I go to edit copy what this copies out is the vector graphic of this and so if you go into I'm on the Mac you can actually go to preview what's nice about previews you can just do file new from clipboard and what this will bring up is actually the the clipboard image and again this is it's a vector so it's it's unlimited resolution you can save us out as 1200 dpi 2500 epi whatever you want and so again that trick was not really trick but just selecting it and jump and go into edit copy once that's my recommendation for for saving them out now for making a grayscale certainly the complication here comes when you have data where you're trying to show multiple categories and so if i go back to an iris plot let me just make a new one here and let's say i'm showing petal length by species and I want these bars to be different colors so I color them now the colors won't come out gray scale of course but you can right click and there's fill patterns if you like certainly you can make the fill colors all gray scale and so we can make one slightly different Gray's on the other that's not my favorite if you really want to show differences the fill pattern is something that you you might want to try and so you can do different hashing I'll right click on this one do fill pattern and do something different and so use these cautiously if you have to do gray scale of course this is this something you're probably going to have to do but certainly be careful about what what hashing is you're using for that one so that's the recommendation there now if you're saving things out for the web the recommendation I would have is saving things out as interactive HTML again that's file export from the mac or save as on the pc and there's an interactive HTML option one additional way and jump especially in jump 13 which just came out a few about a month ago is this window option under windows there is the are sorry it's under view it's create web report and what web report will do is actually go through whatever open graphs you have it'll allow you to specify names for each of them but the important thing it does is it'll build you an HTML bundle it's actually every graph you have and build you a little web interface for it and so these are actually all still interactive we can click on them and you can still work with the points sure yeah so there are times where a 3d does tell a story under graph scatterplot 3d is in one that I often point to and just like everything a jump set up with the columns you want you'll get the plot the graphics are interactive one question we got last time would be could you save this out as something for the web and and that's a very tricky thing to do so no but what I normally do I'm just going to embellish this a little bit with ellipses what I normally do is if I need to save this out for a presentation i'll set it spinning so hold down the shift key and just sort of drag it and that sets it spinning although it's slower for you guys on the web but what you can do now screen record like i am now so you screen capture and save out a video file or an animated gif from your screen capture software so i be my recommendation yeah so that's what i was going to say so if you let's say you've made a number of grafts you really like so you have this one which you've also made another and these are ones that maybe you're going to make multiple times and so you want to make them again or save customizations every red triangle and jump has a save script section and there's multiple options so save script to a data table into a journal are the ones that i would recommend lucia show you what these do so i'll save to data table it'll ask me and jump 13 what i want to name it in previous versions it just saves it out in your data table on left hand side you'll see I have this section where I can actually just rerun these scripts you can right click and run and previous versions of jump or just click the play button and that brings them back now if you save out to a journal that will actually save a script to journal like I have on the left hand side of my screen that's this journal and so you can actually save them out that way so if you make lots of different graphics this is a great thing to do especially because of the data changes you can simply rerun the graph that you had and bring them back you know rather quickly and so highly recommended for that sure yeah so the data filter is available under the red triangle it was under script before but now we promoted it it's under local data filter and the local data filter is nice you can actually four different variables specify ranges you want to show and the graph will update originally or update immediately and so we can cycle through you know different pedal widths here to see how the the interior relationship changes this isn't the best graph for if I go to you know say San Francisco crime and remember i was showing police district by day of week here and if i go to local data filter what if i wanted to filter this by by type of crime so if i do category and i wanted to show just arson or just assault or just drugs and narcotics the local data filter lets you very quickly go through these different categories and or any variable in your data set to see how the visual really change on the base of the levels we've chosen and if you want to animate that there is an animation section what animation does is just cycle through the different level so this is more appropriate i haven't find it's more useful when you have a continuous variable to cycle through and so i was doing something great builder for them great great question yeah so if let's say species here and you didn't want to tosa you didn't want them alphabetical you're not going to find that under the the graph builder settings the reason why is graph builder is inheriting the order from the data set and so in the data table right click a column go to column properties and go to value ordering and value ordering will bring up a section here where this is actually within the column properties you just change the order and so i'll put them an opposite order there so click ok and notice what jump will do is immediately update the graphs now there is one ordering you can change so if you double-click an axis there is the reverse order option but oftentimes you want to order a little more principally so days of the week for instance so again right click the column in the data set go to column properties and go to value ordering
Info
Channel: Julian Parris
Views: 3,957
Rating: 4.9183674 out of 5
Keywords: JMP
Id: 3FQrh1DEvwc
Channel Id: undefined
Length: 62min 46sec (3766 seconds)
Published: Fri Nov 04 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.