JMP Academic Series: Engineering and DOE (25Oct2016, JMP 13)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well thank you everybody for joining us today again my name is MIA Stevens and I'm with the academic team here at jump and today we're going to talk about tools and methods that are commonly used in engineering statistics courses so we'll see how to summarize and graph data and jump we'll talk about statistical intervals and hypothesis test we'll see how to build a variety of different types of statistical models we'll provide an overview of some of our tools that are built-in for designing experiments and also analyzing experiments we'll see a highlight of some of our quality tools and before we get started I'd like to point out that most of what I'm going to talk about today there are a variety of resources that cover these topics in depth and one of the most important resources is our help so if I click on the Help button you'll see books and eBooks provides links to all of our documentation so for summarizing and graphing data and statistical intervals the book basic analysis and essential graphing provide all of the background details and information there's a book dedicated strictly to designing experiments fitting linear models and specialized models include topics for the linear and nonlinear modeling and there are separate books for quality and process and also reliability and survival we've also provided a number of other resources on our academic community so the quick link is jump comm slash teach and I'll simply point these out quickly so this is our academic page and on the bottom of our academic page is links to a variety of different resources so getting started videos academic web castle you'll see a listing of additional webcasts that are coming over the next month or so and the Learning Library provides information on how do I do X in jump so if you're just getting started with jump you'll see that we provide guides that are broken down into a number of categories if I click on the sign experiments outline this will open the outline and you'll see the variety of topics listed so these guides most of the guides have short videos these are one to three minute videos these guys all provide enough information to get you started so if you know you want to do a particular thing the guys in the videos are a really nice place to start and let me take a step back for a moment we do record all of our videos so today's webinar will be recorded and posted on our drum academic community so the link to our community is on our home page the academic community and if you go to the community and searched for the word webinar you'll see a listing and there's a collection of recent academic webinars and this webinar will be posted there so you'll see that the webinars that we recorded this fall have all been posted with that I'll return to jump and go ahead and get started so I'm using jump on a Mac but if you're familiar with jump you know the jump runs natively on both Mac and Windows if you're brand new to jump we recommend that you watch one of the getting started videos so we'll start with just how to summarize and graph data jump so most of the beginning introductory navigating features of jump if you're new to jump we recommend that you watch one of the earlier videos so I'm going to launch a data set I'm using a jump Journal for this webinar and jump drill just allows me to easily navigate and provides an outline for what we're going to talk about today so I'm going to launch a data set from the same data directory and under help you'll see that there's a sample data library this contains something like 500 data sets and you can also see an organization of this data library under sample data so I'll be using data sets from the sample data library and this first data set if you're looking for it is under control trick so these are some data on diameters of a part and we've got information on the operator who rent machine the Machine number and the phase so this could have been data collected during an improvement project where we had two different phases or something could have changed during these phases and we're going to start looking at tools and jump for summarizing and graphing these data a nice starting point if you if you're given data and you really don't know much about the information or you just want a nice high-level summary of the data is to use the columns viewer so under columns towards the bottom is the option columns viewer and the columns viewer allows you to select as many variables as you're interested in and recall that we use this little red bar to indicate the day is coded as categorical data or nominal data blue indicates the diameters coded is continuous and green is used to indicate ordinal data I'll click the button that says show quartiles and then select show summary and I'm going to click this clear select button to deselect all of the variables so this gives me a good high-level view of the data the variables in my data set so I can see that I've got 40 categories of day so 40 days worth of information when we have categorical data we'll see the number of categories under end categories where we have continuous data like diameter we'll see the minimum and the maximum and some basic statistics and if you select this show quartiles option you'll also see the median and the quartiles and interquartile range so I like doing this anytime I first start looking at data to give me a feel for the range of the data to get a feel for the shape of the distribution to try to get a sense for whether I'm missing values if we're missing values you'll see a new column called n missing and also just just a high-level overview to get me familiar with the information that I have I know that I've got four operators I've got three machines and I've got two phases so let me close this now if I want to be able to look at summary information but I'd also like to look at the distributions for the variables I'll use analyze and then distribution the way the analyze menu is broken down is distribution as look is used to look at one variable at a time univariate graphs and statistics and as you hold your mouse over a menu you see a description of features that are available from that menu item appear so fit y by X this allows us to look at the relationship between one variable and another variable so from here I can ask for two sample t-test and ANOVA simple linear regression logistic regression cross tabs and chi-square tabulate this is our version of a pivot table so if you'd like to produce numeric summaries of your data we've added a new feature and jump 13 I'm using jump 13 for this webinar text Explorer so if you're dealing with unstructured data a fit model is if I'm dealing with models where I've got multiple X's or multiple Y's so let's start with distribution and we'll talk about some of the other menu items as we go along the distribution is a univariate graphs and statistics and the reason that the icons next to the variables are important is that they help jump decide the type of analysis that makes sense based on the variables that we selected so if I select all of the variables and click Y columns or click and drag I can select the variables and I'll click OK and for categorical data like de and operator machine and phase we get a bar chart and frequency distribution if we don't like the look or the layout jump gives us a vertical layout under the red triangle we can select the option stack stack endorses to a vertical layout these little grey icons can use to tuck things away so I'm not overly interested in de at this point so I'll simply tuck it away now if you find that there are changes like that that you'd like to make every time you use jump under jump or file on a Windows of seat machine you can set preferences so you can customize the look and feel a jump so this is a distribution for diameter greater machine and phase and on the side we see that there are a number of summary statistics that are provided so by default for continuous data we see the knee the standard deviation the standard error of the mean and we also see the confidence interval for the mean so this is a 95% confidence interval for the mean if there are additional statistics that you'd like to display under the red triangle next to summary statistics select Customize summary statistics and there's a full list of additional statistics that we can request so I'll often turn on the option and missing and if I'm dealing with data that are skewed I might turn on one of the robust options we can also change the alpha level for the confidence interval so instead of having a 95% confidence interval if I'd like to display a 90% confidence interval I can simply change it here and this will add these options for this analysis only and again if you'd like to change the default summary statistics that display every time you do an analysis you can go to preferences and change the preferences for summary statistic for continuous data we also see quantiles so I see the median and the first quartile and the third quartile and I also see the range of values for continuous data we see a histogram and I'll just move my mouse in the corner make this a little bit bigger and we see a box plot and box plots are particularly useful for seeing the shape and the centering and the spread of our data they also give us an indication of whether we've got potential outliers if our distribution is roughly symmetric and if we're dealing with very very large data sets the information in a histogram can be a little bit difficult to interpret so box plot to give us another way of looking at the shape of our distribution within the boxplot we see some additional information so there's little diamond in the center of the box plot is the mean so the center of the diamond is the mean and the tips of the diamond are the 95% confidence interval for the mean now anywhere in jump where you need additional information you may see something like this box plot and not be quite certain what you're looking at I'd like to share with you two built-in help features that help anytime you're in a platform where you're seeing something and you'd like some additional information one of the features is that now I'll just come over here to this confidence interval if you click your mouse on top of a confidence interval or any statistic and hover around in a clockwise direction you'll see the definition of what you're looking at and some help interpreting the statistics so we can see that this is the upper 95% mean which is the upper end of a 95% confidence a second form of help that you can get within any platform is this question mark and if you're on a Windows machine and you don't see this toolbar across the top simply click the Alt key and the menu will appear so I'm going to click on the question mark and if I want it additional information for example on this box plot I'll drop the question mark on the box point and this will launch the help so here I can see the box plot and I can see some definitions of the components of the box plot and scroll down to see additional information so anywhere within jump you can get additional information using hover help or by simply using a question mark and clicking where you'd like it this one so back to our output here and I'll make this a little bit smaller so we can see everything again for categorical data we see a bar chart and frequency distribution and a little bit about all of our tables if you have a table of summary statistics or any statistical output if you right-click in that table there are additional options so you can change the look and feel of a table you can also add additional statistics if they're available or you can make that table of statistics into a separate data table or combine data table if you're interested in all of the statistics and for both of these types of graphs so we've got a histogram a box plot summary statistics for continuous and a bar chart and frequency for categorical you'll see that there are red triangles underneath right next to the variable name so if I click on the red triangle you'll see that there are a number of additional additional options available and we'll talk about these as we get going so for now let me close this so we've seen so far columns viewer for looking at summary statistics for many variables at a time distribution for looking at histograms and box plots bar charts and summary statistics for categorical or continuous data what if I just like summary statistics so if I just like numeric summary statistics and maybe I'd like to add additional pieces of information I'll use this option tabulate and what tabulate gives me is a pallet or I can drag-and-drop variables so if I just interested in looking at the summary statistics I can use this platform so for example diameter is the variable that I'm interested in and I've got two drop zones I can drop it in the drop zone for rows or in the drop zone for columns and notice as I drag the variable on top of these zones the zones have a blue border drawn and the blue border indicates that I can let go here and jump will do something so if I'm if I'm analyzing data the variable that I'm analyzing I typically like to drop this in the drop zone for column two by default I see the sum but if I'd like to replace the sum with additional statistics for example the mean and standard deviation I'll drag those statistics and drop them right on top of the word sum and the sum is replaced with these two statistics now I can break this down by for example operator to see if their operator operator differences and the drop zone for rows is now a tiny little box and I'll drop this right inside of that box and now I see a summary for the different operators so I can see that operator rmm on average is a little higher than the others and had a variability that was a little higher now I can also break this down by other variables so for example if I'd like to break this down by operator and then machine I'll drag machine right next to operator and we see this little blue rectangle indicating that I can let go here so this is the drop zone this will summarize first by operator and then by machine and if I click and drag machine to the front end of operator then it will summarize by machine and then by operator we can close the control panel when we're done and again there are additional options in the red triangle so if we want to turn that control panel back on with like show control panel and we can also make this into a data table to save our work if we'd like to come back to this again later under save script our number of options for saving the script or the JSL code to recreate this output later so if I select save scripted data table it'll save a new option and I'll do click okay here they'll save a new option to my data table and if I click on the green triangle next to this oxygen it'll regenerate the output exactly as I had it right so that's tacitly how do i how do i summarize the data graphically if i'd like a really nice lexical platform for summarizing the data and producing graphics i'll use the graph builder platform and again this is a tool that i use a lot when i'm first getting started with an analysis to get me more familiar with my data so there are drops of here if i drag and drop diameter to the y zone I see a scatter plot and I've got different zones where I can release variables and I won't spend too much time on this is just covered in depth in some of our other videos but this is a nice way of looking at potential relationships between diameter and some of the variables so if I drag operator I can see a dot plot broken down by operator and across the top are a variety of graph elements that I can add to this graph so for example if I click and drag on box plot I can add box plots to this picture if I click and drag on line I can add lines that connect the means for the different box plots and I can also break this down by other variables so for example I could break down by machine or I can group by machine or add machine as an additional variable so very nice tool anytime you're going to build a model or if you've already run an experiment going to be a look at the data first before you go in and analyze the data it's a nice tool for getting familiar with the data before you take it further all right so let's now let's keep going and I will just briefly mention that if you want to be able to slice and dice your data by values of variables in the data set there's a nice data filter option and if you're doing an analysis where you've got a lot of data and you want to be able to switch out and look at different X's or different Y's and you want to do this dynamically there's a tool called the column switcher this release so let's move now to talk about statistical intervals and hypothesis testing so in jump anytime we want to deal with one variable at a time where we know that we're going to either either construct confidence intervals or perform hypothesis test we use the distribution platform when we're dealing again with two variables at a time use fit y by x and fundamentally for more than two variables that we want to be able to fiddle in your model we use fit model and I'll come back to this in a few moments so I'm going to open up this data cleansing and this is some data again from our sample data directory and what we're studying is coal particles in a tank and this is a situation where we're trying to explore pH and polymer and we're trying to clean coal particles out of a tank so the lower the value the better alright so I want to look this one variable at a time and we're here we're going to focus on things as well like hypothesis tests and statistical intervals so it may be interested in a confidence interval to estimate the mean a prediction interval or tolerance interval so I'll quickly show you how to construct these we may be interested in looking at different distributions and we may also be interested in a Quinlan's test so let's take a quick look at these so for dealing with one variable at a time or I know I'd like to perform statistical inference I'll use analyze distribution and here I'll simply focus on coal particles again I'll stack the distribution now we already see confidence interval from mean in two different ways we see upper 95% mean and lower 95% mean and we see it represented as a diamond in the box plot the red triangle next to cool particles if we like a different confidence interval so for example maybe we'd like a 90 percent confidence interval this will produce a confidence interval for both the mean and also the standard deviation under the red triangle if we'd like a different type of interval so for example a prediction interval a fiction interval produces an interval that is likely to contain specified number of future samples with a certain confidence so for example if I'd like to produce a prediction interval that will contain the next ten observations the default is a two sided interval I'll click ok then we produce a prediction interval so this is this is an interval likely contained the next ten observation and again we see the mean and the standard deviation now another type of interval is a tolerance interval and a taller interval is an interval that's likely to contain a specified proportion of future observations with the given confidence so here we specify two values the confidence and the proportion of future observations that we liked the interval to cover and this is commonly used in situations as an alternative to performing a capability study so I'll click OK here and jump produces a tolerance interval so the way we interpret this is we're 95% certain that 90% of our future observations will be contained within this interval so what about hypothesis testing so for hypothesis testing well I might first want to look at the shape of the distribution so I might ask for a normal quantile plot and the normal quantile plot if our data roughly follows a straight line and falls within the bands then we can conclude that the data are roughly normal I'll go ahead and turn that off and let me minimize some of these guys there we go so they don't take up as much real estate to perform a hypothesis test I can perform a test on a mean or on a standard deviation I'll go ahead and select as mean and this will perform either a t-test a Z test if I enter a standard deviation or a nonparametric test the value that's on the outer bounds of our confidence interval and I think our confidence interval was let me hit cancel here and open this back up again let's say that we like to be at 250 particles so we want to test the hypothesis that these cleaning strategies are allowing us to hit our target of 250 particles so I'll select test mean and I'll plug in the hypothesized value again if I'd like to do a nonparametric test the nonparametric tests are always grouped wherever we see the corresponding parametric test I'll click OK and anytime to form a hypothesis test and jump you're going to see some some new output on the side and p-values are represented as the prob greater than something and in this case our reference distribution is a student's t-distribution so where you see the prob greater than the absolute value of T this is the p-value corresponding to the two-tailed test so our actual value for the mean was 301 our hypothesized value is 250 so we see the test statistic is produced and we see PP values corresponding to the two-tailed test and also for the corresponding one tailed test the curve that we see at the bottom is there to help us interpret the test results so the curve is centered at the hypothesized value and represents the distribution of sample means we would observe in repeated sampling for the given sample size the red line is drawn at our observed sample mean so it gives an indication of how far our observed mean is from the hypothesized value and the area in the tails beyond our observed mean represents the p-value if you're new to p-values and hypothesis testing under the red triangle next to test mean is a p-value animator and this p value animator shows us the same information but allows us to increase or decrease the difference between our hypothesized value and our observed value to see what happens to the P ratio and the p value so if you're teaching statistics this is a really nice teaching tool or if your student just getting your hands around p-values it's a really useful tool to explore these concepts one other thing you can do from here is under sample size you can do some sort of ad hoc exploration of power so what if I had 100 observations instead of 18 or what if I only had 10 how does this impact my test so I'll go ahead and close this on and one of the tests that I'd like to show while I'm in the distribution platform is a relatively new test and jump called test equivalence and what tests equivalents does is allows you to enter your target but it also lets you put in a difference that if your difference between what you've observed and your hypothesis value is less than that number of Units then jump will consider them practically equivalent so if I click OK here the jump is actually performing two different tests so it's performing two different one-tailed tests its producing a 95% confidence interval and we see the shaded region is our target plus or minus the number of units we specified and if these don't overlap then we can call them different if they do overlap then we can consider them to be practically equivalent and again that option was test equivalents let's move on now actually to do one more thing while I was there and that is fitting distributions I close that a little too quickly there if I'm interested in exploring the shape of this distribution under the red triangle down towards the bottom is continuous fit so I can fit a normal distribution or a specified distribution I can also ask jump to fit different distributions and compare these distributions and select the best fit so if I select all jump is testing all of these distributions they'll be testing on the distribute and then it will compare all those distributions and tell us which which fit is the best and it seems to have ah stops there it goes all right I was a little too impatient there so jump is telling us the best distribution is the Weibull distribution and is using this AIC corrected statistic to give us information on which is the best distribution in fact the Weibull and the normal are only different by about 0.75 on the AIC see range so these are actually two distributions that jump we consider to be very similar in terms of fit so in this case I would probably select the normal distribution since it's not really far far off from the Weibull in terms of the fit and we'll return two different tools for testing different distributions in just a few moments now what if I'd like to to look at one variable against another variable in this case I'm going to use fit y by x and fit y by x is a really nice reversible platform for looking at data that has different coding so if interest in looking at coal particles for example and the relationship between coal particles and pH this little key in the corner tells us what type of analysis we'll get based on the modelling type for the variables that we've selected so the icon corresponds to the variables so on the side this corresponds to the modelling type for our Y variable we've selected and across the bottom it corresponds to the modelling type for the X variable we selected so here we've got a continuous Y and a continuous x so jump will take us to the bivariate fit and notice that this little graph icon looks like we're fitting a line through points so that's exactly what we'll do if I also select polymer jump is going to do the combination of both coal particles and pH and coal particles and polymer so in this case I've got a categorical X the jump will take us to the one-way platform and notice the little graph icon looks like box box plots so we're fundamentally interested in comparing the different groups we won't talk about this in this workshop but if we're across the bottom here we've got a categorical lie we can do a logistic regression ordinal logistic regression or multinomial and we can also do contingency tables in chi-square tests so go ahead and click OK and as always there are additional options under the red triangle so under the red triangle next to butter it fit we can fit the mean to the line we can fit polynomial or other types of fits fit special is what is um if I want to do a transformation of the X or the y fits flexible allows us to explore splines and as we go down through the list you see there are a number of different options here I'll start by asking for density ellipse and this is where you get a correlation to jump if you're just looking at two variables so if I select density ellipse 0.9 and 5 this draws an ellipse that helps us get a visual representation of the nature of the association between the two variables and if I click and open this little outline view below it gives us the the calculated correlation and also a p-value for that correlation now if you have the number of different variables that you're looking at and you want to explore correlations under analyze multivariate the option to use is multivariate and for anything that you fit below the graph you'll see that there's a red icon with additional options so for example I might choose to shade this or select points there inside or outside in this case I'm simply going to remove this fit and and I'll fit a line so if the line and jump below the graph you'll see new output so we see the fitted equation for the line we see a summary of fit with R squared R squared adjusted root mean square error and ANOVA table given the overall test or significance of the model and then under the parameter estimates table we see the estimate so these are the coefficients that are in the linear model and anytime you see a parameter estimates table and jump I'll tuck some of this stuff away if you'd like to ask for additional information for example maybe the confidence interval for the coefficients if you right click on that table and ask for columns you can ask for lower and 95% interval so this is the confidence interval for the coefficient and if you're in a multiple regression situation you can ask for V if's to allow you to assess multicollinearity now I fit a line here when you fit a line again you get an option below the graph and several different options are provided one of the first thing I typically look at is plot residuals so plot residuals produces a number of different residuals to help us assess whether a linear model makes sense we can fit confidence curves for the mean or for the individual points we can shade those intervals so again additional options are always available under the red triangle for whatever we have fit if we'd like to save the formula out to the data table if I select save predicted this saves the model out to the data tab now for the one-way case we see completely different options so here we're looking at different polymers and we're interesting exploring whether there are differences between these polymers so under the red triangle we see things like quantiles which fits boxplots and produces quantiles means ANOVA for officially comparing the groups and again anytime you see diamonds and jump these are confidence intervals for the mean and we're fundamentally asking whether these intervals overlap so down below we see the ANOVA table we see the summary of fit the ANOVA table and the means for the individual groups additional options from here and I'll go through this relatively quickly there's a new analysis of means method which is the different approach to comparing means if you want to be able to isolate out which means are significantly different from other means we use options in a compare means several different not nonparametric procedures are available normal quantile plots CDF plot and additional graphical options are available under display options so I'll move on from here so we saw a distribution for one variable at a time PIPP y by x for two variables at a time let's talk about building statistical models so I'd like to be able to build a statistical model there are different types of models we can build we can build linear models non linear models and jump also has nice built in tools for reliability and survival so let's start with looking at fitting models fitting linear models and I'll open a dataset called car physical data again this is another sample data set and here I'm fundamentally interested in horsepower as a function of a lot of different variables so I've got the country and this is information on different cars that were sold in the u.s. at some point in time so I've got country type the weight of the car the turning circle displacement is Indian side and let let's say that we'd like to build a linear model the brake horsepower horsepower as a function of some of these other variables anytime we fit linear models and jump where we have multiple axis we use analyze and then fit more and a fit model we specify whatever our Y or response or dependent variable is so here it's horsepower and an under model effects we specify our X's or our factors you know if that country and our country actually has three levels and jump you don't have to dummy code or create indicator variables jump will actually do that behind the scene or all that country I'll add type in fact I'll go ahead and select all of these other variables so here I've got my main effects now if I want to add interaction so let's say you want to add interactions between these three variables I can add a full factorial so I can add all possible interactions in fact this is the same platform we'll use to analyze design experiment so let's see I'd like to just add in all possible two-way interactions between these three variables degree is set to two and this means that if I select factorial to degree it will add in all possible two-way interactions between those variables if you know that you just have two variables that you'd like to add the interaction for so for example turning a circle and gasping size you can select those variables and then hit the cross key and we'll hit in those specified interactions so from this platform this is this is probably the most versatile and flexible platform and I'll jump under personality the default is candidate least squares or ordinarily squares regression and this is because I specified a continuous response variable under the red triangle you can see that this stepwise so this is for model selection model reduction generalized regression if you have jump Pro this allows you to fit models to non normal responses it also allows you to to use penalize regression tools like lasso and elastic net in Ridge regression if you've got a design a split plot design or design we've got a nested sort of structure you can fit it as a mixed model and again in jump pro you have some additional options for fitting on mixed models with different covariance structures we'll see in a few moments proportional hazards or parametric survival if you're dealing with reliability or survival data and a lot of different options available so I'll simply click run and at the top we see an actual by predictive plot and this is kind of like a residual plot what we're looking at is these bands or confidence bands the tight of the points are to the line the less unexplained variation we have in our model the blue line is fit at the overall average of our data and you see some summary statistics below this picture so this is giving us an indication of the overall fit of a model and how significant our model is there's an effect summary table that sorts our terms in our model by p-value a residual plot and several different plots are available so if we start looking at this if I'd like to reduce or simplify this model we've actually added some interactions to the model so these little carrots are indicating we don't want to remove this term from the model because it's involved in an interaction that's still in the model so we can use this little panel to slowly reduce our model so if I select the variable I'm going to select this interaction and click remove and again we'll do use the same strategy when we're building models for design experimental situations I'll slowly remove terms from the model and at this point it's indicating that everything else is involved in an interaction so this is as far as I could go if I want to visualize this model like I'll tuck some of these things away at the bottom by default we see a profiler and this profile is also available under factor profiling along with the letter a lot of other visualization tools so what the profiler allows us to do if I open up this parameter estimates table is we just built a linear model and what the profiler allows us to do is visualize our coefficients and also explore what happens if we change values of the different predictors so for example if I look at weight the slope for weight is increasing this indicates that my response on average increases as I increase weight holding everything else constant so on the side we see the predictive value for my response confidence interval for that response and if I drag weight I'm just going to drag this vertical lines low-end we see the changes in the predicted horsepower now because I've got interactions in the model notice the slopes of the lines and the other panels also change so there's an interaction between weight and type of car I do the same thing for displacement keep an eye on the slope for country at the very beginning here as a change displacement from the low level to the high level it actually doesn't change that much so this is a really nice way of exploring interactions and from this panel this got a built-in optimizer so I can optimize this to find settings for my X's that allow me to hit a certain target it is also built in Monte Carlo simulator so we did that relatively quickly but in the interest of time I'm going to keep going and let's talk a little bit about fitting nonlinear models so jump has a really nice facility for fitting nonlinear models and under analyzed we see two options and they're specialized modeling fit curve and nonlinear so what fit curve does is allows us to fit several previously defined nonlinear models to our data and explore these models nonlinear also allows us to use this fit curve capacity but will allow us to select a model from a rich model library or to use a formula that we've created ourselves so let's take a quick look at this so I'm going to open up US population and this is the data set where we've got this variable population looking at population growth over time and we know that's not not linear now this is this particular data set has a formula that's been created you can write your own custom formulas just by adding a new column and access the formula editor and in this particular case we've written a formula and we've defined different parameters so if I use this in the nonlinear platform it allow me to find optimal values for beta 1 and beta 2 or beta naught beta 1 using these values as the starting values let's just take a look at this using fit curve for now so I'll go to specialized modeling fit curve also like population as my Y year as my X and click OK so from here under fit curve I can ask for a number of different types of curves and as I select a curve it fits the curve and produces the number of statistics down below so it shows me what the prediction model is also shows you the parameter estimates none under the red triangle or a number of different options I want to compare this to other fits under the red triangle I can select another fit so for example I'll fit a logistic for peak of for parameter model and it overlays that model on top of the existing model the model comparison platform allows us to compare those two models that we fit and the AIC weight is essentially saying assuming that one of these two models is the correct model what's the probability that the correct model is quadratic or legit they can in this case it's saying that the quadratic is a much better fit so that's using fit curve now if I use a nonlinear platformer I've got the option I can set it up the exact same way where I've got population as a function of year and this will give me the same options or I can use that formula that I previously defined if I use formula that previously defined it uses those starting values and allows me to find optimal values using the model that I've specified or I can use the nonlinear model library so there are a number of different models that are built in here to give a starting points and we can if I select a model for example we can see this graph we can see the parameters in the model and then we can specify starting values for those different parameters so very quick tour of nonlinear and very quickly reliability and survival just so you can see what's available here under analyze reliability and survival if I'm fitting a model for data that is fundamentally time to failure or time to event data I'll use one of the options on the reliability and survival so life distribution is sort of like the distribution platform where I'm dealing with with one variable at a time that life by X I'm adding a second factor as we scroll through we see there are several different tools for looking at reliability and in survival and then we can also fit different models so I'll show you very quickly and we open up a data set life distribution and this is a data set again from the sample data directory and here I'm looking at time to execute a job now if I'm just interested in looking at this variable I'll use life distribution but I can also fit a model so I'll start simple and just use life distribution and life distribution allows me to specify censoring so if I've got a center column the type of sensor code I can add a failure cause I can also compare groups it will simply select execute time as the response and click OK so by fall we get the nonparametric kaplan-meier curve and what this allows us to do is explore different curves for example if I want to explore a log normal fit to this clicking on the scale option will apply that scale to the data so it's like looking at a normal quantile plot I'm fundamentally interested in if I apply a certain distribution does this plot straight now it looks like not log normal there's a pretty good job and by selecting the log normal see this distribution profile so this allows me to see what happens to execute time actually to the probability as I change the execute time below we see a number of additional options that are available for exploring the data if I'd like to be able to use some built-in tools to allow me to pick the best distribution in this case I'll select fit all non-negative and like we saw before jump will fit all the different distributions and it'll give us a comparison of those available distributions so in this case it says the best distribution is crochet and other similar models are generalized gamma and the log resistant let's life distribution and again if what a fit a model or do more complicated work than we can use some of the other options that are here slip get into the designer experience so I'm going to try to spend the bulk of the rest of the time here on design experiments so the do a menu in jump allows us to design and evaluate variety of different types of experiments you'll see all of the classical designs under the classical option so this is screening design so fractional factorials in placket Behrman you'll see response service designs full factorial mixture designs in Taguchi arrays and jump there's a nice custom design platform and what this allows you to do is instead of selecting a textbook design you can design an experiment that meets your needs so if you have constraints or your limitation in the number of runs or if you have certain interactions or quadratic terms that you know you need to be less to main the custom design platform gives you a really nice flexible interface for doing that augmented allows you to if you've done a screaming design allows you to add additional ones to this design or if you're doing sequential experimentation allows you to start with a small factorial design and then add axial runs or Center points to that design Diagnostics so this allows you to evaluate a previously generated design or compare competing designs so you may have several different designs you're considering and you want to look at these designs for different different criteria so maybe the power or the prediction variance so there may be different criteria for selecting design and this allows you to easily compare those different designs and this is also where you find the sample size and power calculator so this gives you options to calculate sample size needed for given different types of hypothesis tests so let's take a look at an example experiment and this is again from the sample data directory and the data sets its reactor 32 runs on this is an experiment involving five factors each at two levels and our response is percent reactant so by default when you design an experiment and jump you'll see this pattern columns show up and the pattern just tells you whether the variable is set at the high level or the low level you'll also see a column in this column will be blank when we first generated jump and you'll see some additional options on the side for going back and coming up with different designs so this is a two to the fifth full factorial experiment and how would I design a jump if I know that I want a full factorial I'll use classical and then full factorial is fine but each of these factors is continuous so I'll add five here under add n factors continuous to level but note that you can also add categorical variables here if you click on any one of these labels you can double click and change the name and I won't do that here you can set different response goals you can also set up word lower limits so we may for example want the lower limit 4% reacted to be 95% you can also specify the values for the different variables I'll click continue and jump tells us this is going to create a 2 to the fifth full factorial by default is going to be completely random in the order and we can also add center points and replicates here and a replicate in this case indicates it's going to replicate the entire design so when I select make table this is generating full to the fifth factorial design with a little F column y is blank so we would use this as our design table and enter values of our response variable as we run each of the individual trials now an alternative to this is to use instead of using full factorial is to use custom design so I'll add five factors here and then continuous I'm doing basically the same thing at this point so it looks the same so far I can specify the response goal and again I can specify the limits here with a 95 here and I'll click continue but this is where it's different so from here I can find factor constraints so I may have linear constraints there may be combinations that just aren't feasible so I can specify different types of constraints or disallowed combinations or things that I just don't want it to make possible in my design if I select interactions second this will add all possible to interactions from here I can specify a response surface design I can add in particular cross terms or particular powers so if I specify second jump ads in all possible two-way interactions and note that at the bottom jump is keeping track of the number of runs required I can add in additional individual replicate runs so for example maybe I want to add in two extra runs here and jump keeps track of the number of run so by default if I select make design jumble is going to generate a 24 run design so this is substantially smaller than the 32 run design that I fit with 32 runs so you can you can really create efficient designs that take advantage of your limited capacity or limited resources those can I do from here from here I can generate split off designs mixture design so there's a variety of different design types that I can generate from here and under custom design at the top there are also different types of optimality criteria that I can set so I'll simply select make design here and anytime you make a design and jump in this case since I'm using the customer designer there's no one correct design there are all sorts of different possible designs and jump tries to find a design that meets your criteria and you'll see see this little design evaluation options this gives you the ability to take a look at your design from the perspective of power and the variance in your arm your prediction the variance as a fraction of your design space so their variety of different criteria that you can use for selecting this design so if you'd like to be able to minimize the variance in your estimates of your coefficients or certain designs you might want to select maybe when add additional runs but you'll see all this information here to be able to guide you in terms of picking the best design under design evaluation so let me close this let's take let's take a look at analysis and I've got a twenty two twenty run design that I've generated earlier to analyze this design quickly under analyse fit model this looks a lot like what we did earlier and if I hit run here notice that this is the same exact platform for analyzing the design again I'll use the same features for reducing this design okay so same options that we saw earlier and let's take a look at the design evaluation so a new feature in jump 13 is the ability to compare competing designs so I've got these two designs I've got a custom design with 20 runs or I've got the full factorial with 32 runs we may want to be able to compare these to see if we lose a lot of information by having a design it's 12 runs smaller than the full factorial so here I would use design diagnostic compare design I'm going to look at this reactor 20 run design versus the 32 run design I've got the same 5 factors in both so I'm going to match on those two columns and click OK if I scroll down it's basically allowing me to take a look at how these two different designs compare so I've obviously got a little bit more power with a 32 run design we can look at the prediction variance so there's a little bit more variance on the bounds of a design space for the the smaller design so there's a trade-off but compared designs allows us to to look at these designs and balance the trade-offs for having smaller much more efficient and the last couple of things I want to talk about is design tools or quality tools there are a lot of tools that are kind of standard quality improvement or quality control tools tools so statistical process control all of the standard control charts are built in there's also a really nice platform for dynamic control charting so all of our quality tools are grouped under analyzed quality and process if you're looking for a scanner control chart you'll see it listed under control chart I generally will use the control chart builder which is like graph builder but allows you to drag and drop and it automatically produces a control chart so on the side we see the summaries for the individual and moving range chart you can right click and change the chart to a different type if I drag date now into the bottom panel the default if I hit undo was an ir chart but now if i want an x-bar and our chart a simply dragged a on and now it's automatically sub grouping by day so each one of these points in the top chart is an average of several observations now if i want to break this down by some other variable like i did it in the and the graph builder to start with i can drag that other variable to a face zone and now it's produced in control limits for each of the different phases so a super dynamic and interactive platform for creating the standard control charts and from here you can produce the shugart charts for both variables and attributes data but you can also produce charts for rare events and I'm doing this rather quickly because we're almost out of time if you're doing process capability studies and jump you can use the distribution platform the distribution platform allows you to hit capability studies for continuous data or also non-normal data so I click on a red triangle you see an option capability analysis on so under normal you'll see that there are different distributions that we can fit here I'll go ahead and plug in some values you can see what this looks like so I'll say my target is is 15 so if I if I simply select long-term Sigma then this is going to give me P P and P P k if I do short term it will give me both CP and PPE measures so I'll click OK here now by default it doesn't give us the capability labeling the PP k capability to labeling that this is something that you can set in the preference but by default we're going to see a summary of the overall capability along with some other measures now if we're dealing with data that's measured over time and this sample data is actually measured over time we can use the control chart platform and if we're dealing with measures where for example we might have a process where we've got several measures and we'd like to monitor all of those measures or produce capability metrics for several measures there are two different platforms and jump and these are under screening so if I'm interested in in trying to monitor and use control charts to monitor several variables at a time I'll use this process screening so this produces control charts for several different variables if I'd like to be able to assess the capability for several different variables then I'll use this process process capability option and the last topic which we will only have a chance to get to is measurement studies and keep it measurement capability studies and there are two options under quality and process measurement systems analysis this is going to use the wheelers EMP or wheelers evaluate measurement process approach and if you select an option under variability attribute gauge chart this will do the standard gauge R&R or attribute studies so with that I think I'm out of time so we talked about using different tools for summarizing and graphing data like the columns viewer distribution to tabulate the graph builder for producing graphical summaries of our data we talked about different types of statistical intervals AI hypothesis test from the distribution platform or from 2 y by x different ways of producing statistical models so fit model for producing linear models nonlinear for producing nonlinear models we briefly touched on survival and reliability we've reached awk Tibet how to design experiments and jump and how to analyze those experiments and some quality tools and again the jump helped in the book under help books you'll find a lot of rich details that go much further into each of these methods that we introduce you so with that I think I'm out of time and I'll stop and turn this over to to Ruth's
Info
Channel: Mia Stephens
Views: 5,400
Rating: 5 out of 5
Keywords: academic, engineering, jmp 13, mia, DOE
Id: VHSObqUtB64
Channel Id: undefined
Length: 58min 32sec (3512 seconds)
Published: Tue Oct 25 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.