JMP Basics for Professors and Students (10/11/2016)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
wonderful well thanks Ruth and thanks everyone for joining us I'm excited to tell you a bit about jump if you've never seen it before and excited to show you some of my favorite things about jump so before I get started let me take you through a little bit of the plan for today and of course we'll leave time at the end for any questions that arise so keep note and as Ruth mentioned use the chat and Q&A panels and we'll get to all of those so I'm gonna start by introducing you to towhat jump is especially if you've never heard of jump or or never seen it before jump is obviously statistical software but it's it's sort of different statistical software than you might be used to and so seeing jump for the first time I always love to show this because it lets you see statistics and data in a new way I'm gonna take you through some jump basics so that after this webinar you can get started on your own you could of course grab a trial and I'll show you where to get that at the end but I want to make sure you leave this webinar knowing how to do some of the most basic and important things in jump I'm gonna take you through a tour of what I'll call the essential platforms no single webinar could could possibly cover everything jump could do so I want to take you through some of my favorites and ones that'll get you started best and at the end I want to introduce you to the jump academic program so Ruth and I are both from the academic team and and we work hard to make teaching and learning materials that make it simple to course learn jump but also simple to work jump into your courses so with that let me start by taking you through a tour of what jump is and I like to talk a little bit about the history because I think it gives some insight into why jump works a little bit differently than other statistical software and so jump started about 26 years ago now in 1989 and it was started by one of the cofounders of SAS and so jump has always been a SAS product one of the co-founders around 1989 I decided he wanted to make something different for statistical software he he had just bought a Mac plus and he really loved that it had a mouse and he could interact with the operating system in an interactive way before then we were mostly working on command-line so he thought why don't we have statistical software at SAS that also takes advantage of the mouse and takes advantage of this interactive operating system so the idea was to create Cisco software that was visual interactive and powerful and so actually the first version of jump was really a Macintosh only product here it is running on a Mac SE and so jump actually stands for a John's Macintosh program they just can come up with a better name for so we all call it jump and let me just open some examples here to give you a sense if you've never seen jump before what I mean about it being interactive I'm gonna take you through all of what I'm doing in a little bit a little bit more slowly but I just want to show you some of the power and some of the interactive nature so here are some sample data these are Hollywood movies and let's just say we want to graph from this so instead of having to write code I can simply say well maybe a question of the audience score against the Rotten Tomatoes score you know are they related we should hope they are audiences should agree maybe we want to know if this is related to the world gross of the actual movies so we can size the dots by this and see hopefully that you know the movies that are doing better make more money of course there's some notable outliers here points that are strange because they're low scores but they're high gross it's a Pirates of the Caribbean and what's this one here so this is Transformers Dark of the Moon so graphing and jump is interactive we can simply drag and drop variables around let's do something a little more basic what if we just wanted distributions of some of these so I'll grab some columns click them into a roll click ok what jumps gonna do is create the histograms or frequency distribution plots I'll talk about this a little bit more of a things and jump are interactive so when I select observations in one histogram or really in any plot they're gonna be selected everywhere and this gives us a nice way to interact with our data sets if I click on the action movies I could see where they are in the number of theatres opening I can even see them in the data table so when I select them here they're selected everywhere jump also allows you to make special kinds of graphs so I'll pull up some San Francisco crime data and very quickly we can just make a street map actually of San Francisco and look at the sort of geographic density of different crimes and so this connects out to OpenStreetMaps and this is interactive too I can zoom in to a particular region it'll download the more high-definition street map and we can explore visually if we have data such as us demographics something we have a state variable we can also graph these so I'll take state put it into map shape we get a canvas for the United States let's look at the smokers across the country maybe we see some hot spots you know sometimes there is a geographic story in your data so making a map like this there's really what communicates or gives you meaning from your data sets now jump is also great for teaching you know not just we're analyzing data not just for visualizing data but also for teaching Kahn's so we developed teaching modules as well so here's one that I like this is looking at really the simulations about regression so if we take multiple samples from some population we can see for each individual sample what line of fit we get and so we get a really visual example you know why do we have a confidence band around the regression line well every sample is one instantiation from that population and there is some true effect out there a true relationship but each sample only gives us a guess of that one thing I also like about this demonstration is let's fit a line to our sample data and I like that students do this I say grab a point into the middle move it up and down we can see the line change in real time but one thing that's nice about this let's drag out the axes remember everything in jump is interactive so we can just move things around I'll put a point way up on the regression line and if we move that one we can feel the tug of that point because the leverage is exerting on that actual line of fit and so jump makes the ideas and statistics on the concepts and statistics interactive and visual and there's one I like for sampling distributions a little bit more basic but the idea of taking samples from a population and drawing that sampling distribution in real time we can do a number of samples at once I'll just pull a thousand all at once and we can see that normally distributed sampling distribution of the mean you can also show things like the central limit theorem suppose we take samples from a skewed right population something where with small sample sizes we wouldn't expect to see a normally distributed sampling distribution but with samples of a sufficient size let's just do 60 you know we can see that the sampling distribution is drawn towards that Gaussian shape and so we can demonstrate concepts that are otherwise a little bit mysterious sometimes for students of making things interactive and physical I can put them into a context that makes them more easily understood all right so let me step back I did a bunch of things really quickly there but the idea that jump is visual and interactive should be it should be meaningful I mean this is different than a lot of other statistical software you know I've used jumper for 10 years and actually use most of histah go software I think it's worthwhile to learn lots of things because you want to deploy software in the context they're most appropriate and jump fits in a very special place you know when you're doing analyses on your desktop jump is an in-memory tool for mac and Windows so when you're working with datasets that you know aren't big data you wouldn't want to pull in four billion rows but jumps gonna be fine with 20 million rows of data something that can fit in your your RAM on your computer and jump is actually a family of products - I'll just point this out that we can come back to it at the end you know jump what I'm showing you here is actually jump row it's one of several products that that jump makes so we have jump in jump row they're really the analytics and the analysis part of jump and then we have jump clinical and jump genomics specialty products for types of data certainly when you're working with genetics or working with clinical process and clinical trials and then there's also the jump graph builder for iPad you know that graphing platform I showed you where I was making graphs well that's also on the iPad and so you can download this for free from the App Store and this is a great way especially when you're just graphing data or playing with data on your own you can just do it on the road so we'll come back and talk about some of those later now again jump is Mac and Windows both the versions are Co released which means that they're developed together they're gonna look identical and except for minor differences because of the operating system they're gonna work the same way and so I'll point out differences when there are them but really note that you can get this on whatever operating system you want today I'm going to be showing jump on the Mac as you can see and I'm gonna be using jump 13 this is our version that was just released a few weeks ago and so if you were to download a trial right now this is what you would get if you're at a university that already has jump and and most universities do so check with your IT people they should be getting jump 13 soon they just have to deploy him all right so with that let me step through some of the jump baek's basics so how to get jump to do the things you wanted to do and how is jump a little bit different from other software well start with the most basic question how are you gonna get data into jump well because jump is modern statistical software it was developed at a time when we had a mouse and we had an operating system that was interactive you don't have to write lines of code to bring data in you can simply do file open a jumble open all sorts of file types CSV stat files you can open from Excel SAS Minitab most file types are accessible to jump you can also open directly from a website so if you do internet open and pass jump a URL jumps able to parse those tables on the web so importing data is very straightforward making a new data table is as simple as filed new and you can enter in your own data and if you want to work with sample data jump actually comes with a whole lot of it so on the help menu there's a sample data section and let me show you this really quickly because in the webinar today I'm going to be working extensively with sample data so if you want to follow along with this later and there is going to be a recording of this so don't feel like you have to do it right now you'll want to go to the sample data index and I like to just open the sample data directory that way you can go to the data set you want directly but if you're just exploring the sample data note that there are organized by types of analysis on the left and types of sort of Industry on the right so if you're looking for datasets that deal with food and nutrition go right to that section also the teaching simulations that I mentioned there under the teaching resources section at the bottom so there's a datasets that are specifically good for examples of teaching scripts for teaching calculators so if you still teach and show sort of basic entry from summary statistics we have some calculators that let students do it that way and simulators that allow you to simulate different situations so check out the sample data directory now for this first little bit I'm just gonna pull open a sample data so that I like called restaurant tips I like this because it's pretty basic we have a hundred and fifty seven rows and we only have seven columns so some basic information about the tips and the bills for different tables experience at a restaurant over a few week period now before we do anything about how how I think jump thinks and I think this is my favorite section on the mind of jump let's talk about some basic navigation issues and if you've used other statistical software or if you even used Excel some of this will be pretty familiar so first basics of the interface now you may notice the bottom right hand side I have the jump home window and what this shows is the open windows that we have so far I have the restaurant tips file I also have this jump basics for a professor's journal I'm actually using this is actually a piece of jump to get my presentation and journals are great under new new journal they allow you to make sections just like I'm showing here add links you can even put analyses in there so the basics of jump is really that it has a home window a data set and any other windows like journals or script windows you have open now when you first install jump you're gonna get a window called the jump starter and I'm gonna pull it open from my window menu and this is nice when you're just getting started because it shows for the different categories and jump the location and jump where you do those analyses so the platform's as we'll call them so for instance if you're working with clustering and you want to know how to do a hierarchical cluster you can click directly to it here now the jump starter in essence recreates the menu structure so under the analyze menu clustering is where we get the hierarchical cluster but the start is a nice way when you're first getting used to jump to scroll through a list of really analyses and jump right to those particular platforms in which the analyses live so we'll come back and look at the jump starter a little bit later but I recommend when you're first getting to start with jump it's actually good to dive right into the menus and I'm gonna give you some tips on on finding things and jump and sort of knowing about how jump thinks of data so you can guess where jump would put those analyses are so that's the basic inner interface let's talk about the difference between especially useful and jump data and metadata now the data of course is what's in our data table you know the actual observations we have but there's also metadata that's important to jump so one really important one is if I have all right click on a column here if I go to column info so the information or the characteristics of a column this is the metadata jump is storing about the column so the data type these data are numeric modeling type something I'll come back to in a second actually really important for jump in essence this is what you're telling jump the observations in this column actually mean that is are they continuous are they numeric do they mean something on a real scale or are they simply categories and notice that this column tip amount we have the dollars in there so I've actually told it that this is currency we have a number of decimal places and we have the type of currency under column properties there's a whole lot of other metadata you can apply to a column some of these we'll talk about later but basically if you or need to modify what jump thinks about a column you're going to get to it in column in fill all right so that's the columns you'll also notice that just structurally in the table the columns are also listed along the left-hand side and when I click on any of these the appropriate column is selected now this is nice when you have very large data sets you know I have some with hundreds of thousands of columns and it's actually nice to be able to organize and group your column so we could actually just graph some columns on the Left right-click and group them so we can keep our columns list even quite organized and so that's actually a handy thing now let's talk about the Rose though so Rose can also accept metadata and the two important ones are hiding and excluding so I'll just grab some Rose let's say Rose four five and six and let's imagine that for whatever reason we know these observations shouldn't be counted in an analysis and we also don't want to show them in any graphs they're just bad data for some reason or data we want to exclude now you don't need to delete them from your data set instead we're going to apply some metadata to the Rose I just right-click right clicking and jump is something to always try there's always some option behind to right-click and those we have the option to hide and exclude at the same time you also have the option to just exclude or just hide let me do hide and exclude and notice what jump does is adds to the Rose the little symbols showing that the data here are excluded exclusion means that it won't be counted when we do an analysis and hidden which means they won't be shown when we make a graph there are times you may want to only hide or only exclude for instance you want to show some outliers in a graph but you actually don't want to count them in the analyses so you would only exclude those rows but not hide them but since we're mostly doing the same operation at the same time there's the option to do them both at once so let me bring those back into the data table all right one other thing about jump navigation and you saw me do this earlier it's jump is interactive and what I mean by this is when I brought up the distribution platform something I'm going to talk a lot about this was where we got histograms you notice that when I select observations I'll just select the lowest table for tip amounts everything gets selected so I see in the other plots those observations I also see in the data table those operations now this confers some pretty substantial benefits what if I grab the points that are kind of outlying in tip amount I can right click and exclude them right from there all right so I can select anywhere and I can apply operations back to the table from anywhere and this becomes really useful especially in data exploration now there's another side to interactivity which is that these red triangles you see everywhere so I have one here I have one extra credit card and I have an exit tip amount these are really the location where we produce more analyses jump is built in a kind of a deeply hierarchy to go away which to say as you produce output jump will give you new options that make sense to do at that time now this is actually kind of an important point and it brings me to the next section which is how is jum thinking about analysis and how is I'm thinking about data and jumps doing it in this is my little pitty statement for this jump thinks about the situation you're in not the statistic you're looking for let me back up and just say the mind of jump obviously I don't think jump has a real mind but jump has had developers that have been with jump for the 26 years has been developed in fact John saw who first started jump he's still the principal architect and chief developer of jump so all these developers over all this time who have their own conceptions about how analysis and data should work they put themselves into jump so in a sense we're using jump in in their minds and so I like to think about the mind of jump as well how you have to communicate with a software to get it to do what you wanted to do but also how it's organized and how it thinks about data and the way that jump does this again it's about the situation and the situation as I define it is the statistical context you find yourself in now what does that mean and come back to remember I said the modeling type was important so context as far as jump is concerned is what type of data you're using for an analysis and jump distinguish this really three types of data so qualitative data can be nominal or ordinal that distinction is really nominal meaning just we have observed categories like whether somebody used a credit card or not or they can be ordinal qualitative so things like t-shirt sizes small medium and large those are strictly ordered but we don't think that the small has to be exactly as different from medium as medium is from large and then we have quantitative data data that's actually measured numerically and is on a scale uniformly we would say interval scale or ratio scaled but for us we can just think about numbers that actually mean numbers so tip amount and bill amount here now stop and say numbers don't always mean numbers or don't always mean something numeric for instance I've seen surveys where people store gender as one two and three so one and two for male and female and then three four did not respond we don't think there's anything actually numeric or scale about those numbers they just represent categories and so in jumps mind or for us to interact with jump we need jump to know what the data in each of our columns actually reflect in the world and the way we do that is the modeling type of the data now we can get to modeling type a number of ways remember I right-click to column and went to column info well the modeling type is right there we can select it and notice for a credit card use its nominal right now on the left hand side and the columns list we also have an indication of the modeling type so if I click on the little red histograms here notice that that column is set to be nominal alright so jump wants to know about the modeling type and it does for important reasons the type of modeling or the modeling type of a variable constrains the types of analyses that are appropriate for it for instance if I want jump to show me a histogram or show me some representation of frequencies well for credit card use that column I should get a frequency distribution plot and maybe a table of how often yeses and noes were were used or how often credit cards were used but if I asked the same question the same situation about tip amount that is I want a some representation of what observations I got and some measure of center well I don't want a frequency of each individual number I would rather get something like the mean or a standard deviation and so jump is going to pay attention to this modeling type when I ask that question because it's really the same question I'm asking about both columns but the output should be different and so let's do this all at once here so I'm going to go to the distribution platform so under the analyze menu and I'll talk more about platforms in a second but this is a platform that's all about asking that one variable at a time question what observations did I get in this column so I'm going to pull open the distribution platform and I'm going to do an operation that's called casting a column into the role that is for me to move forward from this platform and it's a tell jump what columns I wanted to consider and I can do this a couple ways I could drag tip amount into the section notice I can just drop it there another way I can do this is grab a column on the left and click it into the Y role those are casting the columns into this distribution platform when I click OK and this is the output we got before notice that jump really did something a little bit different for each of the different columns for the numeric and continuous column of tip amount it's shown us a histogram quantiles and summary statistics like the mean something that makes sense for these typo data for credit card use I got a frequency table and a frequency distribution plot so jump has contextualized the output even though I asked the same question what observations did I get in this column it's given me the right kind of output and the way jump does that is because of the modeling type now there's a second aspect to jump in I before call this the fact that jump is deeply hierarchical and the way we kind of call it here is that jump has a progressive interface that is as you produce analyses like I did here with the distribution platform jump will give you new options new things you can do that make sense given the context or situation you find yourself in and what do I mean by this so for tip amount a column that has numbers and we calculated a mean and standard deviation the next things we might want to do when we're in a one variable space with a continuous column maybe we want to test a mean you know maybe historically we've observed a mean of four dollars for tips on average but this last couple weeks it looks like it's a little bit different so maybe we wanted to know did we change something in a restaurant did our servers do a worse job do we think this three point eight four is statistically significantly different from the mean of four that we've observed historically right really a question of is the sampling error or is it really different enough to be concerned about our servers well to move forward in this analysis we don't go backwards and in jump this is true always you don't go backwards up the chain and relaunch the platform instead you always go forward and the way to go forward are these red triangles and so you see them next to every output and one for the main platform let me click the red triangle next to tip amount and you'll see what I mean that jump is going to offer me options that make sense in this context so a numeric column that we're asking one variable questions about well we can get something like a cumulative distribution plot there's our testing mean option we'll do that in a second we can do things like getting confidence intervals for a different proportion we can do continuous fits so maybe we want to see if do we have evidence that these data are drawn from some particular two tribution or we want to do a measure of fit against that so these options all make sense given the context we're in so let's do the one that we asked about testing a mean you know under a null hypothesis that the mean historically has been for do we think 3.84 is so different that we should be concerned I'm gonna click OK and notice that jump doesn't produce a new window instead were we're embellishing this current output that we have and we've tested a mean so we get our two-tailed p-value there we get our actual degrees of freedom estimate all the summary statistics and sort of inferential statistics we should expect now you notice we got a little plot here and this is true all across jump it's actually one of the ethos that's been with us for about 26 years which is always show a graph when you're gonna show a statistic and the statistic of interest here is probably that two-tailed p-value you know how unlikely would it be if the population mean was four that we should draw a mean of three point eight four or more extreme well that's really a question of the sampling distribution and this is what this represents the sampling distribution of the mean under that assumption Amina for in the population and the shaded regions some up to the two-tailed p-value that is how unlikely would it be to get three point eight four or less or the other side of that or more now notice we get another red triangle jump is built again in this hierarchical way maybe we want to get a p-value animation something a lets us explore had we specified a different null hypothesis mean or had we observed forty people instead of as many as we got so we can see how that sampling distribution of the mean changes these are great for teaching as well talk about power in your class you can actually show with the power animation at different specifications of the null and alternative distribution lead to your false alarms and lead to your misses all right so let's back up here we're talking about the progressive interface notice the jump contextualise both the output and the new options we got as we produced analyses so we shouldn't expect when I click on the red triangle here right this is the same sort of question one variable at a time and jump but it's a different statistical context we're working with a categorical column so we shouldn't expect to see testing a mean right but maybe we would expect to see test proportions or probabilities right do we expect under the Knoll that half of our table should use a credit card or not and so we can specify this as well and so I'll put in point five and point five so specifying under the null hypothesis and we get our chi-square goodness-of-fit and Pearson so notice that jump is doing something rather important here it is contextualizing the output based on the modeling type of the data in the data table that metadata that we set and it produces analyses in kind of a special way most other software requires you to go to a menu and select the exact analysis you want but notice we didn't see a one sample t-test in our menu that one sample t-test we ran for the tip amount column that's surfaced once we were in a platform a location and jump a big room that basically contains all the analyses when we have a one variable question and jump surface that analysis in a particular context when we were in the distribution platform but specifically in that platform with a continuous column all right so jump will kind of help you choose and make good decisions about what types of analyses you want to run so at once that's what I mean about the mind of jump it's concerned about the sits the situation the statistical situation not the particular analysis or statistic you want to run jump is organized in this way where you find the room that does the types of things you want to do and jump will show you those options when you're using the types of columns that allow you to compute those operations all right that's the modeling type in the progressive interface now just a couple tips for new users check out all the red triangles click all of them as you're learning jump and certainly right click I mean since jump was built in 1989 this is a time when we had a mouse and we were working with a graphical operating system jump is built it that way you know older software and legacy statistical software that's based on command line only you know they didn't have the affordances of a mouse so they never built that into the software but jump works like modern software because it was built during a modern era certainly customize your jump preferences so under the jump menu for the Mac you'll find the Preferences it's under the file menu for Windows and the nice thing about customizing your preferences is after you've every time you launch into a platform you know every time you run let's say a distribution on a categorical column you can tell jump what you wanted to show you that is you can embellish it with all the things you want maybe you want your axes on the left you know I've actually put a count axis on my distribution output that's not on by default one thing I'll mention is if you're going to download jump and use it yourself when you run your distribution the first time you're probably going to get your histograms oriented a different way that is if I go to the top red triangle and these are options for the whole platform notice I have stack as checked by default jump comes with the tables and the columns like this or sorry the the output and a vertical orientation like this which is great when you have normally a laptop screen that's wider than it is tall that is we can see more of the output all at once but most people are used to seeing their histograms horizontally which is why I like to show webinars with stack enabled if you want to set that as a preference you simply go to the Preferences under your jump or File menu go to the platform section on the left and then find the platform you want to customize here I'll find the distribution platform and notice I have stack checked but maybe you want to have other things happen by default maybe you wanted to always test a mean of a certain value or show a stem-and-leaf plot or a CDF plot since jump is built around these little interfaces you know as we produce output these are little interfaces to our data set customize them to work exactly the way you want that way you're never wasting time producing output it's always there right when you open it so customize your jump preferences remember that you can save your work out very easily and so if I produce some simple output here let me show you a couple of these so let's say you wanted to take all of this output and put it into a different format on the Mac it's file export on the PC it'll be file save as again it works just like you would expect a modern software to work well you go to export you see have a number of different options we can take the images only out as many different file types if you're going to work with publication graphics I recommend the scalar vector graphic we can export this out as interactive HTML or a PowerPoint what PowerPoint will do is create slides for each different output for you an interactive HTML is a really great one especially when you're sharing your output with people who don't have jump yet you can save it out and these we open up in a web browser and what's kind of neat about these is unlike basic HTML these are still interactive that is they retain that cool interactive nature of the histograms and the graphics and so if you want to share these on the web people can actually still interact with your data I note of caution on that since they're interactive and people are interacting with the data the data is stored in this HTML and so don't do this with private data instead export those out as images and one final way to get output out of jump under the Tools menu there's something called the selection tool and this is really great simply select what you want I'll grab that histogram and it's as simple as edit copy and now if you were to go into any other program you can simply paste it in this works for the tables as well so if you grab a summary table or quantiles or frequencies these will copy into Excel or word actually as tables and so you can actually get output really quickly and move it to other software all right now under the help menu there's something I want to point you to which is very useful when you're first learning jump and it's the statistics index so let me go there now help statistics index and what this does is list out all the analyses that jump can do and there's quite a few of them but what's a value here is if I let's say go to multiple regression jump will give us a topic help goes right to the help file for this section but also can launch the platform that does that type of analysis and so this is what's called fit model and jump it's run under the mean analyze menu and so for multiple regression again because jump is going to pay attention to the type of data you put in as soon as you specify a Y variable that's continuous it's gonna set a personality and we can build the model effects below so presumably wanted a multiple regression with two variables right so the statistics index is a great way to find where those analyses and jump live much like the jump starter does so under your original window jump starter you can also go under fit model or go under multivariate you know these sections allow you to find the particular analyses or particular platforms that do the analyses you want to do and so as you're first getting to learn jump those are handy this is just statistics index especially but once you kind of get familiar with how jump thinks about data you'll have no problem finding the platforms in which analyses live finally if you like reading documentation I personally don't but we actually have really great documentation so if you somebody who likes to read the manual go to the help book section and all the manuals are built in and so they're really nicely written we have a whole writing team that just designs the documentation and so these are very well written with examples and they use sample data and so you can learn jump that way at the end of the day I'm going to point you to some other webinars and a lot of things that we built on the academic team that I think get you up to speed a little bit quicker the documentation is especially good when you're working with something usually something a little more advanced and you want to actually see exactly what jumps doing and know about all the options so with some of the predictive and specialized modeling I highly recommend looking through the documentation alright so with that I want to take you through a little bit of a tour of the essential platforms and jump we've already been playing with one of them the distribution platform but I want to show you really in two big groups here so platforms that are used for summarizing and graphing and platforms that are used for basic analysis and so with summarizing and graphing the first one I showed you actually when I was just demoing jump was graph builder and we can actually keep the same data set open here graph builder is really the place to go when you're creating a visual or exploring your data visually and it's under the graph menu now you notice that because as part of sort of jumps ethos to always show a graph when it shows you a statistic any platform and jump you go to is going to give you a graph a graph builder is about composing a graph or exploring your data purposefully graphically you will actually get some statistics from graph filter if you asked for them but really graph builders about designing a graphic and we have other webinars that go into detail about how graph builder works but let me just show you the basics graph builder is built around drop zones we have drop zones for data the Y and the X and the drop zones that break up the interior based on the levels of another variable and so that's group X group Y and wrapping and then we have drop zones that embellish the graph so overlay to add multiple layers to the graph coloring and sizing which affects the points and so for instance if we're looking at tip percentage and I'll just drag it to the y-axis as soon as I drop it there jump creates a visual now it's just the points and since there's no X variable there simply points that are jittered to give you a sense of sort of the distribution even with just one variable we can make a graph remember we can do something like a box plot right that's something we would get from even the distribution platform we can get a histogram history Graham's gonna be on its side since we took put two percentage as the y-axis or one get something called a contour I kind of like this view this is a view that shows sort of as a folded distribution if I drag the points back on top you see what jumps doing is it's where the points are most numerous the contour expands out so these are often called violin plots as well and so showing the distribution of the variable let's turn that off and actually let's add another variable to the x-axis let's do something categorical I'll do day of week so I'm gonna drag this and those as I'm dragging that the zones where I can drop this column are all highlighted so the X is where I want to go but I could do this in many different places as soon as I do this the dots then get aligned into their groups again the palette at the top allows us to select different visuals maybe when a box box plots are great for looking at the distribution and seeing if there are points that sort of exceed our criterion for outlining this but let's just do bar is a very simple representation of this so we can look at the average tip percentage across the different days of the week and I know it's the mean both because the legend says that and because in the bar controls the thing that controls what these bars are doing the summary statistic is currently the mean if we wanted maybe the median tip percentage we can select that instead I'm gonna keep it on the mean if we want to add additional variables to this plot we have those other roles I mentioned so for instance maybe we want to know whether tip percentage varies as a function of credit card use in addition to day a week so I'm gonna drag credit card I'm actually not gonna drop it I'm going to kind of hover it over some different roles there it is for group 4x there it is for a group for y notice what those do is split up the axis such that we show those differently or overlay which I like so overlay will put the bars side-by-side and actually again next to the bar style notice that's what we have selected here there's different ways we can do this maybe we want to have nested bars bars where one was nested inside the other I'm actually gonna keep it on side by side now one great thing about graph builder it allows you to explore visuals and make sure that it tells the story you want to tell obviously this visual allows us to really tell whether credit cards differ our tip percentage rather differs by credit card we've made that perceptually very accessible because the bars for credit card use or not are right next to each other but suppose we instead wanted to look at for a day of week effects the trend for day of weeks and we didn't really care about the credit card use we just wanted to separate it well that would involve switching the credit cards with the day of week actually moving those rolls here's a nice tip right click a variable if you already have it in the graph go to swap and let me swap that with day of week and suddenly now we're looking at for credit card used no and days of week as the different bars and so we can swap variables to see what tells the story best you know this does tell us something you know on Tuesdays with no credit card use we have some high tippers you know maybe just for this week probably just a few rows that it played into that so being able to swap your variables allows you to really easily try out different visuals and see what tells your story most forcefully when you're done just click done and that closes the control panel remember your red triangle so you can always bring back a control panel if you want so go to that red triangle and bring it back if you want to change things now we have a webinar on data visualization that specifically is all about graph builder so I'm not gonna show you too much more of this although we may play with it a bit for some mapping later so if you are interested in graph builder certainly check out that webinar and I'll show you at the end how to get there so for summarizing graphing graphical there's really the place to go for graphing but sometimes you really just need a table that leads out or reads off statistics from your data set you may not even want a histogram you simply want to tabulate the data and the place to go for this is the tabulate platform under analyze tabulate and tabulate is really nice just like graph builder it's built around the idea of drop zones drop zones for columns for rows and you see the cells that result once you specify those columns and rows and let me show you what I mean imagine that we have a desire to because maybe we manage this restaurant find how much each server has brought in this week in terms of total bills and so what we're eventually interested in is this bill amount column but let's set up our rows first and I'm imagining a table where I have one row for each of my servers so let's drag a column into a roll notice when I hover over the drop zone four rows that section highlights when I drop it here jumps going to quickly remake the table and it's going to show us the first thing you can show is just the number of observations that count as A's B's and C's now we really want to do something additional right we want bill amount not the count so we need to involve the bill amount column so let me drag this and notice when I drag it over the sample sizes for each of those servers it highlights and when I drop it their jumps are going to choose a summary statistic now the default summary statistic is going to be the sum so it's showing us the sum or the total of bills for a B and C which is actually what we wanted right so we wanted to see how much each person brought in and we see that C brought in quite a bit fewer but we did actually notice with the sample sizes that C worked fewer days so perhaps that makes sense perhaps we don't want the sum then we want to take out the effective of how many days or how many tables they had so let's instead show the mean to get the mean let me drag that on top of some now this is an important note in jump dragging on top of another variable is the replacement action so when I drop on top of some it'll replace the sum for each of those servers with the mean and actually in this case we see that see on average brought in more per table so C may have worked fewer days but actually ended up bringing more more business for each table that they had now if we want to have an additional variable in this not just the mean what if we want to show the min and the max bill amounts for each of the different servers well let me show you a drop zone not on top of the mean remember that would replace it but just to the right so notice there's a little drop zone that highlighted just to the right of mean and if I drop there that's the appending drop zone that tells jump add those statistics to this table and so we can see the min and Max for each server for each of the tables they had now that little appending drop zone works for categories as well potentially we might want to look at day of week effects and so for instance if I drop day of week on top of server remember that's the replacement action that will just replace the different servers let me click undo let's use the little appending drop zone so just to the right of server there's that little drop zone and someone to drop it there and those will we get now is a table with those categories nested so first server a the different days a week that server worked and all the statistics we asked for and same thing for B and for C now there's also a pre pending drop zone if I grabbed a a week and drag it to the left of server notice there's a little section there I could actually nest any different way and this goes back to the same thing I said about graph builder try out different tables to see what really tells or makes clear the story you want to make clear notice with this sort of hierarchy with the days a week and then servers nested inside of it we can make comparisons among the server's pretty directly for each day of week maybe that's the comparison we want in the other version what we're really able to compare easily is for each server their day of week differences and so the type of table you make depends on the type of question you have of course they show the same data it's just which shows it most forcefully and effectively just like graph builder when you're done you can click the done button this will actually make the little table now just like everything in jump these are still interactive as I select basically statistics in this table the associated rows going into that calculation also are highlighted and so this allows you to actually use a little tabulation to interact with your table as well if you want to make analyses on this table potentially you want to turn this tabulation into a jump table click your red triangle remember click all of these and notice you have the option to make this into a data table as well and so we can save this out as a data table of its own so maybe this is something we want to actually analyze further you know sometimes tabulations aren't just to get summary statistics but to get the data into a format that allows you to analyze them all right so that's tabulate Iblees is a really powerful way to do your basic summary statistics now for text and unstructured data we actually have a new platform that was added and jumped 13 I want to show you really quickly this is actually some sample data on aircraft incidents so accidents that happen and we have this big long column here of the narrative and so for each of the different incidents we have a description and going through these manually would be a lot of work but under analyze text Explorer just under the tabulate option there's the option actually tabular or actually to explore the text but common phrases and words that are used and so I'm gonna just put my narrative cause in my text column and click OK there's a lot of other options we can specify but notice what jump does first is it pulls out the tokens or the words and phrases that make up the most common sort of incidents that are in our data set and so we see the most common phrase as a pilot failure by four hundred eighty four times I was mentioned or failure to maintain or engine power all right so we can very quickly see what type of incidents we had and maybe what the causes are now often when we do this sort of text exploration a nice visual I go to the red triangle and go to display options is showing the word cloud so the word cloud is a nice way of just exploring what was happening this is still interactive as well so if we grab let's say pilot I can right click and just select the rows in which that was mentioned or maybe landing was mentioned some times let me select the rows where landings were mentioned and so this is both a way to explore and to sort of get some summary of text data in your data set but again it's also a way to interact with it you know maybe once you've selected those rows you want to just analyze those for some particular thing now text explore has a lot of other options if you're working with jump Pro you get some advanced analytics with it too so with some singular value decomposition you can do leading semantic analysis and topic analysis you can cluster the terms and the documents see which things tend to co-occur in your data set and so these are some really nice options especially when you want to explore a little more deeply and that potentially run some predictive analytics on the basis of the extracted dimensions from the text since that's text explore very much worth playing with and again that is a new feature and jump there team alright so those are the basics for summarizing graphing graph builder for graphing tabulate for numeric tabulations and we have text explore for those open-ended or unstructured types of data now let's look at some basic analysis platforms we've already looked at the distribution platform I noticed some highlights about it distribution is all about one variable questions like every platform and jump you just cast their columns into a role when you start and just like everything in jump the output is interactive and one really nice thing about distribution especially when you're first getting to know a dataset is screening for outliers you know not every point that's extreme isn't al but maybe there are points that are so extreme this is not terribly extreme but what if somebody tipped 150 dollars well that may be a real tip I mean it could be a data entry issue it could be a real tip also but if you include it in analyses being so far away from the mean may distort results and may just be such an exception you don't want to consider it remember you can right-click points and you can always hide and exclude and so if you do detect things that are a little bit suspect you don't need to go searching for them in the data side you can actually just hide them or mark them right there remember that distribution organizes itself like everything can jump hierarchically so as you produce analyses like testing the mean or testing for a continuous distribution you'll get new options as you go along and so the distribution platform is all about those one variable questions now I want to show you sort of a generalization of the distribution platform and that's the multivariate platform and for this I'm going to bring up some body measurements and so these are measurements for eye about 22 individuals on a bunch of different physical characteristics and we can look at the the univariate or one variable distributions of each of them what if we want to know how they all sort of relate to each other not a question of whether one predicts another although we can get that information from here it really a question of how do they all Co relate or Co vary and so under the multivariate methods section there's the multivariate platform and I really see this as the generalization of distribution for numeric variables and so I'm going to grab all my columns click them into Y cast them all into that role and click OK now multivariate starts out with our correlation matrix and so each variable and how correlated it is with each other variable I'm going to minimize that because I actually like looking at the scatter plot matrix this is a nice way to look at basically how Co varying or how correlated each of these variables are and we have as 95% density ellipse which gives us a sense of their covariance or correlation right a more narrow ellipse one with a longer principal axis than the other that means they're more tightly grouped around that that sort of relationship and ones that are more circular have less correlation again click every red triangle sometimes you want to show the correlations actually on the scatter plot matrix this gets a little busy with this many variables so I actually don't tend to like that one thing I do like to do is fit lines and so for each of these we can actually get the little regression lines with the competence bands so a nice way to explore these data and just get a sense of what correlates with what again the red triangles always have more options and so we can do lots of things from multivariate we can actually jump directly to principal components we can do some really advanced outlier detection so I really like the Mahalanobis distances or we can get other things like a 3d plot so maybe we want to look at three of these variables and actually look at how they co-occur sort of in three dimensions it's a multi very is a great way of exploring those numeric data and something that I think you'll find a lot of value from all right so if we are trying to make predictions or trying to analyze the sort of effect of one variable on another and we only have two variables that we're considering that's the domain of fit Y by X and fit Y by X is really the next option under distribution here and it is just what it sounds like we're trying to fit some Y variable against some X variable and if you think about it there's several different ways that can happen and again it's constrained by the modeling type of the data so depending on what the variable on the x-axis is continuous or categorical and what's on the y-axis continuous or categorical that sort of intersection is the type of analysis or output we should expect and so if we're talking about continuous predicted by continuous numbers predicting numbers well that's the domain of regression and a great number of other things but principally regression categories protecting something numeric or continuous let's for the domain of ANOVA so an analysis of variance or a comparison of means continuous things predicting categorical things that's the domain of logistic regression and categorical variables predicting categorical variables well that's contingency analyses and mosaic plots so let's take a look at some of these notice we don't have to click anything this is just a table to help you know what jumps going to do let's predict something like tip percentage on the basis of let's say number of guests at the table hopefully we've watch a tour now what do we think we're gonna find from this relationship maybe small tables feel more personal connection to their servers so they tip a higher percentage or maybe big table is 2 pi or percentage because everybody puts in money and so everyone over tips let's find out I'll click OK this will give us the output first so these dots are really just showing in y space for each number of gasps now notice that jump didn't force us to do a regression it's actually waiting for us to select what we want to do next we just starts us off with that graph and that red triangle is where we get to select what type of analysis we want I'm gonna fit a line that's probably the most straightforward thing to do from this we get our regression line of fit here we get the output that tells us about the statistics and probably more interesting for us if we're looking if there's a relationship between these variables the parameter estimate for number of guests that is what is the slope for each additional number of guests what do we expect on average will increase for tip percentage or decrease well we find it's not statistically significant but let's interpret this so for every additional guests the tip percentage goes up 0.3 6 here it's a point 3 6 percent not a very big effect here we're basically seeing that people across any number of guests are tipping about the same okay so let's keep this open for a second I'll move it to the right let's imagine a different question but still using the fit y by x platform a platform that's about those bivariate relationships how about number of guests or sorry tip percentage again but instead of number of guests let's predict based on credit card use so whether somebody did or do not use a credit card well this is going to come up with kind of a similar output at the start right we still have the the dots and we have the notes and yes categories but the drop down here is gonna have different options of course unlike the bivariate or the continuous predicted by a continuous we're not going to be fitting a line that wouldn't make sense for categories instead we'll be fitting something like a means or ANOVA or a t-test the difference here is the means ANOVA and the pooled t-test that pools the variances or the estimates of the variance and this t-test option does it allowing the variances to be different we also have different options maybe we want to check if the variances are different so produce tests of unequal variances or the opposite of a null hypothesis test and equivalence test if we have some difference that we expect to be or know to be not relevant we can test whether the difference in the population is less than some equivalent distance also our nonparametric options are here though this jump isn't forcing you to do a particular type of test instead it gives you the options to do those tests based on when you select them so I'm gonna turn on the t-test like we saw with distribution when we got a t-test we actually get the representation of the sampling distribution this time of the difference right so fit WebEx is all about those bivariate questions and it's gonna surface the types of analyses that make sense given what you've entered and I'll just show you another two of these real quick so if I go to fit Y by ax this time let's put in something categorical maybe credit card and let's do two at once I'm gonna do number of guests so maybe we think that categories or choosing to use a credit card varies as a function of the number of people at the table and let's do something categorical maybe something like server so jump isn't going to do both of these variables as a multiple regression instead we get two separate outputs so jump we'll just put them in the same window on the Left what we're looking at is really the probability of falling into a category so using a credit card or not so this is our logistic regression where we get things like an odds ratio and on the right hand side we're getting a categorical by categorical mosaic plot so showing us sort of over on the right hand side the proportion of individuals who did use a credit card and didn't and on this x-axis the proportions of individuals who fell into each category so the width of the axis here a is the number of times a at a table B the number of times B of the table and there's that C the width of this column is smaller so we had fewer C tables really the magic of the mosaic plot is on the interior for the tables where a was the server what proportion didn't use a credit card and what proportion did and notice that the center line here isn't straight across which is to say that the conditional proportionality is differ that is B actually had more tables where they didn't use a credit card and C had a lot of tables where they did use a credit card and so we can actually see if there is some contingent relationship here do we think that the proportion or probability of using a credit card depended on which server was serving them so maybe we want to do this analysis if maybe we think one server isn't trustworthy and people aren't giving that person their credit card all right so that's fit Y by X for all bivariate questions now when you move beyond bivariate questions that is you have more than two variables predicting a cow variable or more than one outcome variable that's where you get into the domain of fit model and that model is good for multiple regression and a great deal more and so I'll just show you very quickly with it looks like but then we're gonna have to end and I want to show you some of the sort of the basics of the academic program is it just so you know fit models built around your Y roles and model effects and so if we were predicting tip percentage but actually wanted to use two variables at once credit card a number of guests if they add those to the model effects section instead of getting two separate outputs one for credit card predicting tip percentage and one for a number of guests predicting tip they're actually going to be used in the same model and so will get partial regression coefficients now fit model is very general so the personality we're using here is the most basic standard least squares we can also do stepwise regression mixed models manova log linear variance models generalized regression has a whole wealth of penalized regression techniques as well as several others like quantile regression and we have some very special t models partial least squares response screening and generalize linear modeling it's a fit model there's quite a bit more and so i invite you to look at some of our advanced webinars to see how that works alright that's basic analysis now before i take any questions and i certainly want to let me just tell you a little bit the academic program in general and jump so you can see a little bit about what jump does and why you know more than seventy percent of the top US institutions actually teach with jump there's a great blog post called why teach with jump i invite you to read my journal will accompany this webinar on the website so if you want to you can click all these links or just search for that when you're just getting started that i want to show you some places to go and so when you're first getting started jump comm slash teach is where I really recommend you go that'll take you to our academic site it'll auto scroll you down to the resources so we have resources for learning jump you've already found a webcast so we have a lot more we have a Learning Library I'll take you there rather quickly the Learning Library is built around types of analyses so let's say you're teaching or trying to learn correlation and regression if you go to that section we list out the types of analyses that fit under that category and for each of them we have a video two to five minutes long and a one-page guide and if you've never seen one of our one-page guides these are pretty neat so in a single page using sample data all the steps that you have to go through to do this type of analysis and I love these actually when teaching I would use these as basically the technology notes in my class so I would actually assign these and put them up on our course management soft that's just a bundle and so students would actually have the exact steps and so I didn't have to write them out so these are really nice to to use and they're free for use and use them any way you would like so that's the learning library here so that was under just this section for learning librarian we also have a number of case studies a whole group of them so these are worked out examples using real world data and many of them also end with questions so they can be assigned as homework plenty of books use jump we also have elearning courses so for the campus licenses you actually have whole courses that can be assigned to students they get a little certificate at the end and as well we have interactive learning tools I showed you one of those before the regression actually shows you to the sampling distribution of the mean AP stats resources and also interactive jump questions on web assignment so plenty of use and plenty of materials for you to use the Learning Library again I invite you to look at I showed you that under jump comm teach but you can also get directly there going to jump comm slash learn and the case studies and our webcast we also have an academic community jump comm /j AC so Jack let me take you to Jack now at the jump academic community is where I'll be posting this webinar so under this collection for recent academic webinar recordings just go there and I'll post it right at the top and so you'll see a link for jump basics now if you do want to look more into graph builder I did mention that we had a webinar we just did on this there's webinar visualization and graphics that's an hour on just graph builder there's so much you can do with it and really is an amazing way to work with data finally when you're just getting started jump calm so I should get started and the new user Welcome kit these are all available there these are great ways when you're first getting started with jump and finally if you're really just looking for a feature index the jump ComStat index will show you that all right so just a couple highlights so the jump academic suite which many of your campuses have if you're at an institution that's really the campus and Department licenses that's free for everyone to use once they campus gets a license so there's no per person cost once the campuses license everybody can use it at home or at work if you want to know how to get jumped yourself this jump comm academic page I showed you before has a get jump section and we have lots of different ways you can get jump the academic site licenses the genomic site licenses there's also individual licenses for jump in jump pro and also six month and 12-month licenses for students so on the hub comm / JMP so for about a dollar a month or I should say a dollar a week well it's $49 for a whole year for a student and so quite quite inexpensive $4 a month there's also jump student edition something that gets bundled with textbooks and so it does most of the starting pieces of jump and very great for a first or second course and jump again jump I think and one of the reasons I love at most most is that it's really about visualizing data and understanding data certainly as a wealth of analysis but it brings statistics to life and I think that's a great thing when teaching because you can teach those concepts and not just software you can really engage students and we have a lot of resources out there when you're first getting started for textbooks incorporate jump we have the learning library and case studies and e courses and so a lot of things to help you get started and actually us on all the academic team ruth and i and our other colleagues Mia and Volker we're all here to help as well so you can always email academic at jump comm or reach out to any of us individually alright so with that and we're just out of time but I'll stick around as long as anyone has questions and we thank you all for coming
Info
Channel: Julian Parris
Views: 31,109
Rating: 4.9224806 out of 5
Keywords: jmp, analysis, jmp (software)
Id: AIwDFdFk94g
Channel Id: undefined
Length: 59min 3sec (3543 seconds)
Published: Tue Oct 11 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.