Getting Started With JMP 12: Basic Statistics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we'll see how to perform basic analyses and jump statistical discovery software from SAS we'll talk about using the distribution platform to analyze one variable at a time fit y by x for analyses involving two variables in fit model for analyses involving more than two variables we'll have a quick review of tools for summarizing and graphing data and jump and we'll also revisit resources for learning gel for analyzing one variable at a time we use the distribution platform this is the first option under the analyze menu we'll use the car physical data which is also available under the help menu in the sample data directory this data set has 116 observations with 8 variables the icon indicates the modeling type for the variable so here we have three nominal variables and five continuous variables jump we'll use this information to determine the correct analysis to launch the distribution platform we select distribution from analyze and select the variables of interest here also like country type and to select more than one variable at a time click and drag or use the shift or control key jump produces bar charts and frequency distributions for categorical variables in histograms box plots and summary statistics for continuous variables the default view is vertical to convert this to a horizontal view click on the red triangle next to distributions and select stack this is also a preference that can be set under file preferences we see a bar chart for country and the frequency distribution for this variable and note that each of the variables is linked to every other variable by default we see basic summary statistics for continuous variables click on the red triangle next to summary statistics to additional statistics in every analysis will be a red triangle click on the red triangle to add additional analysis options that makes sense for the type of variable that we're looking at so for example we can ask for different display options change the look and feel of the histogram ask for a normal quantile plot and each of these is a toggle switch for basic hypothesis testing under the red triangle we see test mean and test standard deviation test mean we'll perform a one sample t-test or a one sample z-test note that we can also ask for a nonparametric test in jump the nonparametric methods are available from wherever you find the corresponding parametric methods a plug in a value for the hypothesized mean and click OK the additional output is added to the end of the analysis we see the hypothesized value the observed sample mean the curve at the bottom indicates the distribution of sample means we would see under the null hypothesis centered at the hypothesized value the red line indicates our observed sample mean and the blue area in the tails represents the p-value and jump any time we see prob greater than something this indicates a p-value prop greater than the absolute value of T is the p-value corresponding to the two-tailed test and in this case we see we have a p-value of 0.01 8 which is significant at the point 0 5 level if you're new to p-values there's a nice tool under the red triangle next to test mean for helping us to understand p-values and this is the p-value animation the p-value animation plots the observed sample mean as a solid line the curve represents the distribution of sample means we would observe under the null hypothesis and the grabber at the top lets us increase or decrease the distance between the observed mean and the hypothesized value and note as we do this both the T ratio and the p-value change we also observe under summary statistics lower 95% mean and upper 95% mean this is a 95% confidence interval for the mean recall that there are two forms of help directly available from within a platform click on the question mark on the toolbar and place this question mark anywhere we'd like additional information and jump will launch the interactive help or hover on top of a statistic and you'll see a box appear we call this hover help now will help during this video under file we're going to set one preference there are several preferences that can be specified and these allow you to customize the look and feel of jump there's an option to add a laser pointer and I'm going to turn on a laser pointer to allow me to identify key points to focus on during this video if you're on a Windows machine under window specific is an option Auto hide menus in toolbars you change this to never then you'll see a menu appear on every window and I'll click OK to accept these changes so now since I've added a laser pointer I can highlight observations or values that are like to focus on so this is a one sample t-test and confidence interval for me likewise if we have categorical data we have test probabilities to perform a chi-square test and we can also request confidence intervals let's look at this data now to Abel's at a time in this case we'll use analyze fit y by x the key in the bottom corner tells us what types of analyses are available based on the modeling types of the variables that we select so for example if I select horsepower which is a continuous variable as the response and country which is nominal variable as the X the key on the side corresponds to the modeling type for my Y variable so the two options for a continuous response are bivariate or one-way the key for my factor or my X variable is across the bottom so this combination will produce one-way analysis of variance if there are two levels of the categorical variable jump will provide two sample t-test for three or more levels will see ANOVA let's add a second variable in this case we'll add displacement displacement is a measure of the engine size here we'll see a second combination horsepower versus displacement this combination will take us to the bivariate platform from the bivariate platform we can request regression and correlation I'll click OK and note that we have two separate analyses first analysis is one-way analysis of variance when I click on the red triangle we see options for quantiles means ANOVA for comparing means and standard deviations and other options including the nonparametric methods I'll select means ANOVA jump produces means diamonds which are 95 percent confidence intervals for the mean for each of the groups and this provides a nice visual comparison between the means below we see a summary fit a whole model test for the significance and a summary of the means for each of the groups additional options are under the red triangle so for exam if we'd like to conduct post-hoc multiple comparison procedures there are four different procedures available I'll select the first procedure each pair students T allows us to compare each mean to each other mean below we'll see the default statistics connecting letters report and P values for each comparison and jump also provides comparison circles when I click on one comparison circle the mean corresponding to the circle is highlighted and jump is performing a test between that mean and every other mean in this case jump is telling us that other is not significantly different from Japan which stays the same color but is significantly different from USA I'll tuck this away and now let's take a look at the bivariate fit in this case we see completely different options that mean fit line if we select fit line jump we'll perform a regression analysis we also have a variety of other options fit polynomial fit special which allows us to transform an X or Y and a number of other options while we're here let's talk a little bit about correlation in this platform correlation is found under density ellipse will select 0.95 density lips and you see that jump draws an ellipse to a compass 95% of our observations and down below we see a correlation for anything we fit within this platform you'll see additional options below the graph in this case we can shade the contour select points and we can also remove the fit for multiple pairwise correlations a different platform can be used under analyze multivariate methods multivariate so let's take a look at regression click on the red triangle and select fit line jump fits a line to the data provide summary statistics down below and provides the fitted equation for the line again additional options are available under the red triangle for example plot residuals confidence limits we can save the prediction equation and we can change the alpha level so in both of these examples we used a continuous response well let's take a look now at a case where we have a categorical response I'm going to open up a second dataset Titanic passengers this is data on all of the passengers on the Titanic and the variable of interest is survived again I'll use analyze fit Y by X with survived as a response if a response is nominal or ordinal are two options from this platform are logistic regression or contingency and chi-square I'll select passenger class this combination will take us to contingency output I'll select sex this will also take us to contingency and I'll select parents and children in this case we'll see output for logistic regression the jump will perform three simultaneous analyses the graph we see in the first analysis is called a mosaic plot the bar on the side is a legend I'll click and drag to make this a little wider the height of the bar corresponds to the frequency or the proportion of no versus yes and as I hold my mouse over each bar notice we see the frequency across the bottom we see the breakdown of first-class passengers versus second versus third so we have substantially more third-class passengers than we do second-class the breakdown of red versus blue within each bar shows us the breakdown in this case of third-class passengers who did not survive versus those who did survive summary statistics are provided below in a contingency table and additional options are available and by default we see two hypothesis test results as always additional options are available under the red triangle in the case of sex which is the 2x2 table we see some additional options including relative risk odds ratio and two sample tests for proportions we see in this case that there are more males than females and more of the females survived than did not survive the last analysis is logistic regression and let me tuck away some of these other analyses in logistic regression or asking the question what happens to the probability of not surviving as the number of parents and children increased and notice of the slope of the line is decreasing this tells us that as parents and children increases the probability of not surviving decreases likewise the probability of surviving increases below we see a whole model test and some summary statistics and the terms for our model and to provide some help interpreting the graph if we right click in the middle of the graph and select row legend I'll add a legend to this to make it a little bit easier to see what the graph is doing I'll select survived and I'll add markers and you can do this in any graph with an jump so we see the jump has placed all of the yeses above the line and all of the noes below the line and there are only two individuals who travel with nine parents and children so this is logistic regression well let's take this a step further in both of these cases we can build a regression model I'll use analyze fit model and let's say I'd like to build a model to predict horsepower as a function of other variables when I select horsepower as the response the personality or method that will be used is standard least squares regression from this platform there are several different types of analyses that can be conducted I'll add model effects or terms to the model for example country type I'll click and drag to add additional variables and I'll add casting size note that country and type are both categorical variables and jump will dummy code these for us in fact jump uses a minus 1 plus 1 coding scheme to parameterize these terms automatically to add an interaction to this model we select one variable and select the other variable I'll select it here and then click cross I'll go ahead and click run and jump provides a high-level summary of the terms in our model called effects summary an actual by predicted plot which gives us an overall picture of the significance of our model the bands represent confidence intervals the red line represents the predicted mean on one side we see the actual response and across the bottom is the value that our model predicted the scatter around this line indicates unexplained variation or noise and the less unexplained variation the more significant our model additional summary statistics are provided down below including the summary of fit the whole model test for significance a parameter estimates table and effects test the effects tests indicate which terms are significant and we also by default see a residual plot additional options are under the red triangle including regression reports estimates show prediction expression will show us the prediction model that we've just built and one of my favorite tools is the profiler the profiler allows us to interact with the model that we've just built across the bottom we see starting values for each of our variables the slopes of the line indicate how the response will change as we change the value of a particular X and on the side we see the predicted response and a confidence interval for that response we can click and drag these vertical lines to change the value of the X that we're plugging into the equation and notice how the predicted response changes the steeper the slope the more the response changes as I change that value of the X this is also a nice tool for looking at interactions for example we see a highly significant two-way interaction between displacement and country if we keep an eye on the box for country and change displacement notice what happens to the slope of the line for country there's a differential effect likewise keep an eye on the slope of displacement as I changed from Japan to other to USA the influence of displacement on the response depends upon the country and vice versa additional options on the red triangle include a number of different road diagnostics and Diagnostics for the model itself any one of these options will allow us to save values out to the data table to reduce this model we can return to the model dialog window or we can slowly remove terms one at a time using the effects summary table so this is regression when we have a continuous response if we have a categorical response like Titanic passengers will use the same platform so I'll go to analyze fit model survive is my response no the jump updates to nominal logistic regression also like passenger class and several of the variables and I'll add some interactions so for example passenger class and I'll hold down the control key to add these other interactions degree indicates the degree of the interaction that I'm going to add and under macros I'll select factorial to degree this will add all possible two-way interactions between the terms that I selected I'll select run again we see the effect summary table a whole model test with some high level statistics lack of fit parameter estimates effects tests and another red triangle we see options like odds ratios an ROC curve lift curve and confusing matrix and again we see the prediction profiler and as we saw in the case of ordinary least-squares regression the profiler allows us to change values of the X to see what happens to the predicted response so here I'm changing the values of passenger class sex and notice how the contours and the other factors changes as I change the value from female to male this is due to the significant interactions in our model so so far we've talked about distribution randomizing one variable at a time fit y by x for analyzing two variables and fit model for building models with multiple X's where we have a continuous response or a categorical response keep in mind that there are a number of tools and jump for summarizing data under columns an option columns viewer allows us to produce high-level summaries of all of our variables from here if I launch the distribution platform I'll see the distribution output for those variables that I've selected under analyze tabulate and tabulate provides drop zones for variables where I can drag and drop a variable change the summary statistic by dragging and dropping and add additional variables so here I'm looking at the mean and standard deviation for horsepower for the different countries and all ad type right next to country so this is tabulate and tabulate and collar's viewer are covered in depth in another video finally as a refresher the graph builder platform this is our versatile graphing platform for dragging and dropping variables that produces dynamic graphics we have a variety of different resources for learning jump and as a refresher the learning library the learning library answers the question how do I do X and jump so for example under using jump you'll see how to import data import text create formulas for each one of these there's a video and a one-page guide another key set of resources I'll point out is a user community the user community includes file exchange for sharing scripts and datasets and a discussion forum and it also provides access to other communities in this video we talked about the distribution platform for analyzing one variable at a time fit Y by X for analyzing two variables and fit model for analyzing multiple X's with a continuous or categorical response we very quickly revisited the columns newer under the columns menu tabulate under the analyze menu and the graph builder under the graph menu as we were going along we pointed out Tips & Tricks and we also visited the learning library and the jump user community
Info
Channel: JMPSoftwareFromSAS
Views: 18,325
Rating: 4.9487181 out of 5
Keywords: JMP 12, JMP, JMP software, SAS, SAS software, data analysis, data science, statistics, data visualization, basic statistics, distribution, teaching statistics, Learning statistics, learn statistics, probability
Id: QisEzyaf9po
Channel Id: undefined
Length: 24min 34sec (1474 seconds)
Published: Thu Jun 02 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.