Fitting with MATLAB Statistics, Optimization, and Curve Fitting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to fitting with MATLAB statistics optimization and curve fitting my name is Richard Willi and I'm the product manager for the statistics products here at the math work before I begin I'd like to take care of a few logistical issues if you have any technical difficulties during the presentation please use the chat tab to correspond with the webinar host if you have questions about the content please post them to the Q&A tab and we'll do our best to answer them all at the end of the presentation I'm going to start the webinar with a brief overview of regression analysis more specifically I'm going to discuss a type of regression than is ordinary least square ordinary least squares is the default case for regression analysis when you learn regression in school you start by learning ordinary least-squares when you use virtually any software package to perform a regression analysis you're using ordinary least squares the main reason that I want to discuss this technique is to show you that it's based on a series of assumptions if these assumptions hold true ordinary least squares will give you a good result conversely if these assumptions are violated you might want to use an alternative technique to generate your fit next I want to show you a series of common real-world curve fitting problems it turns out that each of these types of problems violates one of the fundamental assumptions that underlies ordinary least-squares accordingly you can often generate more accurate results if you use a different algorithm to generate your fit we're going to see how MATLAB can be used to fall curve fitting problems when your data set contains outliers or when both the X and the y variables contain measurement errors or where you're measuring counts or estimating odds we're also going to look at some examples where the assumptions underlying ordinary least-squares aren't being violated however we still need to broaden the sort of techniques that we're using for example generating an accurate curve fit when you have a large number of predictors creates an interesting set of challenges also in many cases you might need to apply a set of constraints to your curve fitting problem I'll conclude the presentation by showing you where you can get additional information and then we'll wrap up with the Q&A session ok let's get started at the most basic level you can think of regression it's predictive modeling we're trying to predict some dependent variable Y as a function of a set of independent variables more formally we're often trying to drive the best linear unbiased estimator to predict Y as a function of X as I mentioned earlier least squares regression makes a number of assumptions about the data that's being modeled if you understand these assumptions we'll help you understand when it's appropriate to use this technique and more importantly when you might want to consider a different algorithm the first assumption is so basic that many people don't even think about it we're developing a model that describes the relationship between predicted and observed ordinary least-squares assumes that all of the variants in the model is associated with the Y variable when we measure the distance between predicted and observed we do so parallel to the y axis we're going to look at some alternative techniques like orthogonal regression where the distance between predicted and observed is measured perpendicular to the regression line our second assumption has to do with the distance metric we're applying when we use ordinary least squares we're minimizing the sum of squared error in theory there's no reason why we couldn't minimize with some of the absolute value of the errors instead in fact there are occasions when this technique which is formally called l1 regression will generate better results the last assumption has to do with the nature of the error term ordinary least-squares systems that errors have an expected value of zero and that they exhibit constant variance there are a lot of cases where this doesn't hold true one classical example is when you're measuring counts or estimating odds in cases like this you might prefer to use techniques like Poisson regression or logistic regression let's start looking at some practical example as I mentioned a moment ago ordinary least squares regression creates a model but minimizes the sum of the squared errors between predicted and actual there are a lot of cases where this technique is inappropriate the most important is when your data contains outliers squaring the error term works very well when your errors are normally distributed however if you have observations that are atypical that are far removed from the rest of your data points then squaring the error term will significantly skew your model this visual example might help illustrate what's going on this plot shows a set of data points along with the regression model that I derived using ordinary least squares the data set contains an obvious outlier which is skewing the regression model the red curve shows the second regression model that I generated using a robust regression technique called least absolute residuals I generated a curve fit that minimizes the sum of the absolute value of the residuals rather than the sum of the squared residuals as you can see the red regression line does a much worse job fitting the outlier the distance between predicted and actual is much larger with the red line than the blue however the red regression is much more accurate fitting the remaining data points now that we've had a chance to discuss this let's jump into MATLAB and look at the various options we have available for robust regression to start with I want to show you just how easy it is to use robust regression using a sample data set I'm going to generate some X data and some Y data that's arranged on a regular grid next I'm going to generate a set of Z data such that XY and Z all fall on a plane finally I'm going to add a noise vector to Z and plot the resulting data so you can see what's going on it's pretty easy to see that x and y for a regular grid you can also see the noisy relationship between x y and z there are a number of different options for performing regression in MATLAB I normally use the fit command from curve cutting toolbox because I think that syntax is easiest and I like working with the fit object related output I'll start by using the fit command to generate an ordinary least squares regression I'm going to model noisy Z as a function of Kleenex and clean Y I'm going to pass in the argument poly 1 1 which will fit a plane to the data the fit command outputs the fit object and information about goodness of fit next I'll use a plot method to plot the fit on top of my scatter plot generating a fit using robots regression is just as easy all I need to do is pass an additional argument to the fit command in this case I'm using the argument robust which instructs fit T's robust fitting I'm also specifying la r which tells fit to minimize the sum of the absolute value of the errors rather than the sum of the squared errors all these same options are available inside of SF tool we start by loading clean ex clean y and annoy ZZ I'm going to switch the tool over to do regression and by default it will generate a fit using ordinary least squares I can check the robust fit option and use either least absolute residual or an alternative technique which is called buy squared waiting statistics tool box contains a cute little demo that you can use to understand just how robust regression works if you type robust demo I one word at the command line you'll bring up the following interactive tool you can use your mouse to drag different data points around the screen and simulate the presence of outliers inside your data set as you can see moving data point has much more impact on the red curve the ordinary least-squares regression then the green curve which is the output from your robust regression at least absolute residuals and by squared weighting are the two most popular techniques for handling outlier if you need more fine-grained control your best options use a function called robust fit inside of statistic toolbox as you can see the syntax for the best fit is quite similar to the fit command robust fit works by applying a weighting vector to the various data point and by default robust it uses the same buy squared weighting that we looked at a moment ago inside of surface fitting tool rail robust fit really shines as all the different options that it provides robust fit allows you to choose between nine different weighting vectors or even write your own weighting vector the logic behind this is you might want to adapt your weighting vector based on the specific characteristics of the outliers but you believe are going to be present inside of your data set I'm going to jump back into my slide deck to introduce the next topic ordinary least-squares assumes that all of the variants in your model is associated with the dependent variable if you believe that there is variance in both the X and the y directions it's more appropriate to use a technique like orthogonal regression also known as totally squares also known as errors and variables also known as Deming regression this technique measures the distance between predicted and observed perpendicular to the regression line here's a simple example that might help motivate when to apply this type of technique let's assume that I'm measuring your production process X is a function of time my test bench is a very accurate clock so I'm confident that the exam measurement error with my independent variable in this case it's perfectly appropriate for me to use ordinary least-squares to measure X as a function of time now assume that I'm measuring a second production process Y also as a function of time once again very accurate clock perfectly appropriate for me to use ordinary least-squares now let's make things a little bit more complicated suppose I want to measure y equals f of X both the X and the y observations contain measurement error in this case it's inappropriate to use ordinary least-squares and we can significantly improve our accuracy if we use total least squared I'm going to jump back into MATLAB and show you a neat little demo that shows how to use statistic toolbox to solve this problem I'm going to start by generating a clean dataset I'm going to add noise vectors to both the dependent and the independent variables afterwards I'm going to go and try to predict the original clean data using both ordinary least-squares and total least squares and as we'll see total least squares are is a much more accurate predictor in this case one last little note I'll be posting all of the sample code on to MATLAB central index J or so so you're plenty of chances to play around with the salt just like before I'm going to start by generating an X a Y and a Z variable where x y and z all fall onto a plane and once again I'm going to add a noise vector in our previous example I added a noise vector to the Z variable this time I'm adding separate noise vectors to x y&z visually you can see that XY and Z no longer fall on a regular grid and theoretically we're violating one of the underlying assumption for ordinary least-squares I'm going to start by generating an ordinary least square regression that predicts noisy Z is a function of noisy x and noisy Y I'm also going to generate a metric that I can use to measure the accuracy of my predict I'm going to generate the sum of the squared errors between my predicted Z value and the clean Z value that I originally specified next I'm going to repeat this exact same process only this time I'm going to use orthogonal regression bit of background information about the orthogonal regression statistics toolbox doesn't offer totally squares function out of the box however you can use a technique called principle component analysis to perform an orthogonal regression by translating the output of the pca back to the original coordinate system for those of you who are unfamiliar with PCA PCA is nothing more than the application of centering and rotation to a data set and as simple as this might sound it's one of the most useful data analysis techniques available to start with PCA will Center the data on the origin next PCA rotates the data such that the dimension with the greatest amount of variance is parallel to the x-axis the dimension with the second greatest amount of variance becomes the y axis and Y is perpendicular to at and the dimension with the third greatest amount of variance is the z axis which is perpendicular to both x and y so why do we care well it turns out that the first two dimension defined the plane for a regression model the third dimension defines a set of residuals that's perpendicular to this plane and voila orthogonal least squares so this next bit of code uses print comm to perform the printable component analysis next it translates the output back to our original coordinate system and finally it plots the plane predicted by the orthogonal regression for now let's focus on the fact that we have two different planes we have a red plane which was the output from the ordinary least-squares and a blue plane which is the output from the orthogonal regression it might sound basic but the two planes are not identical to each other which means that they're generating a different fit next I'm going to calculate my error term for the orthogonal regression and compare it to the ordinary least-squares regression as you can see the sum of the squared errors is much larger for ordinary least squares which means that orthogonal regression is generating a much more accurate model let's jump back to PowerPoint and start discussing how to model counts and estimate odds these types of models present special challenges because the errors don't exhibit constant variance as a result you can often get better results using a technique other than ordinary least-squares let's start with a simple graphical example of a Poisson regression problem the plot that you're looking at is a set of data points such as count as a function of time hypothetically this might describe the radioactive decay of an atom or the number of requests for pageviews on a website from our perspective what's important is that the count is always non-negative and discrete it always takes an integer value more formally we're assuming that the count can be modeled by a Poisson distribution just by eyeballing the curve you can tell that the errors don't exhibit constant variance in the lower-left the area where the observations are all clumped together the errors will have a relatively small variance the observations in the upper right are spread apart here the errors will have a relatively large variance moreover recall that the count is always non-negative which also impacts the distribution of the error term the distribution of the errors is bounded in one direction since the observations can't ever go negative this boundary condition crops up a lot when the average is close to zero if the average is significantly larger than zero then the boundary condition isn't as important to won't take effect nearly as often so our goal is to generate a curve that describes the count as a function of time at any point in time the curve describes the average of a Poisson process we're going to solve this problem using a technique known as poor soil regression and Poisson regression is a special case of the technique known as generalized linear model logistic regression which is used to model observations that are drawn from a binomial distribution and is typically used to estimate odds ratios is another special case of generalized linear model okay back to MATLAB I'm going to start by generating a sample data set I'm going to create a hundred x values evenly distributed between 0 and 15 I'm also going to specify that x and y are related to one another by an exponential curve and I'm going to plot X versus Y so you can see the original true relationship between the two variables my observation will be a set of 100 random draws from a profile distribution at each x value I'm going to make a single draw from a Poisson distribution with lambda equal to Y I'm also going to superimpose a scatter plot so we can see the relationship between the average curve and our observation next I'm going to use standard nonlinear regression to model y equals f of X I'm going to use the fit command to model noisy y as a function of that I'm going to instruct fit to model the relationship between x and y as an exponential curve this red curve shows the estimate produced by the non linear regression visually it looks like the red curve is doing a pretty good job estimating our true value next we'll solve the same problem using a generalized linear model here I'm going to use the GLM fit command to model noisy Y as a function of X I also need to specify that we're using GOM fit to model a series of observations that are drawn from a profile distribution please note unlike the non linear regression we didn't need to specify that the relationship between x and y should be modeled as an exponential I'll get back to that in a couple minutes the black line shows the model estimated from GLM fit this curve looks almost identical to the output from the non linear regression and the sum of the squared errors is almost identical so right now you're probably asking what's going on why should we bother with GLM fit if not linear regression is going to generate the exact same answer so to address this I'm going to run a quick Monte Carlo simulation I'm going to generate a thousand data sets I'm going to use these to run 1000 non linear regressions and run a thousand profile regressions and I'm going to go and compare the output this chart shows original true relationship between x and y as well as the mean of the non linear regressions and the mean of the pourcel regression and once again the curves look identical I'd be really hard-pressed to claim there is a statistically significant difference between either of the prediction however rather than looking at the mean of the two curves let's compare the standard deviation at each of my x-values I'm going to go and calculate the standard deviation of the $1000 near regressions and the standard deviation of the Poisson regression and I'm going to create a simple plot that shows the ratio between the two values as you can see the standard deviation of the non linear regressions is about one and a half times as large as that of the Poisson regressions for most of the X values on average the two techniques are equivalent however the larger standard deviation suggests that there is significantly more variance in the estimates generated by the nonlinear regression there are regions where the two techniques are equivalent however on average the output from GLM fit is a lot more stable more firmly generalized linear models are preferred because they have a larger scope a generalized linear model will provide accurate predictions across a broader range of conditions than the ordinary least square at this point in time there's a natural question that people tend to ask that exponential curve where did it come from how did you know to pass an exponential curve into non linear regression conversely how did GLM fit know that when you specify a Poisson distribution this implies an exponential relationship between x and y the answer has to do with what's known as a link function anytime that you're working with the generalized linear model grm fit assumes a relationship between the distribution that models your observations and the curve that you are trying to estimate the inverse of the curve that you are fitting is called the link function statisticians have developed a set of standard link functions that they typically use when they're working with generalized linear models for example when they're working with a Poisson regression they assume a logistic link function which means that you're fitting an exponential curve if you're working with a logistic regression problem you assume a logit link function which means that you're fitting a sigmoid curve the documentation for GLM fit provides a complete listing of all the different distributions the GLM fit supports along with the default link function for each of those distributions now you also have the option to override the default settings by specifying your own custom link function if you prefer so far our discussion is focused on Kirk getting challenges that crop up because some of the fundamental assumptions that underlie ordinary least-squares regression are being violated there's another class of problems that occur when the OLS assumptions are valid however you need to extend the technique one of the most interesting set of problems arises when you have a large number of independent variables this embarrassment of riches presents two different problems first if you build too many predictors into your model you can run into trouble with overfitting the datasets that we work with include both trend and noise ideally we want to make sure their models are as accurate as possible at describing the trend without modeling the noise component as well a model that is overfitting is describing both the trend and the noise and models that overfit will be extremely accurate in describing the specific data set that you use for training however they often can't be generalized to other data sets second in many cases external design considerations limit the number of independent variables that you can use here's a prototypical example suppose it we need to build a control system for an embedded design there is a fixed amount of memory available to implement that design and as such there's a hard constraint on the number of predictors that you can build into the model hypothetically we might be limited to no more than five predictor these types of challenges can be addressed using a combination of feature selection algorithms like sequential feature selection and the release F feature transformation techniques such as partial least squares regression and principal component regression and utility functions like cross balanced TV partition let's start by looking at a simple example using sequential feature selection since we are just looking at Poisson regression let's look at an example that applies feature selection to logistic regression so we can see another type of generalized linear model once again I'm going to start by creating a sample data set in this case X is a matrix it contains ten different predictors I'm also specifying a weight vector B this vector describes the influence that the different predictors will have on the final model some predictors like numbers 1 4 and 10 have a very significant impact they have a high weight other is like number two and three have a weight of zero and don't contribute anything to the model at all I'm going to specify that the average relationship between x and y is described by a sigmoid curve and finally I'll generate noisy Y as a function of X sequential feature selection works by testing whether adding or subtracting variables from the model generates a statistically significant improvement in accuracy in this case the quench will feature selection we'll start by adding each of our ten predictors one at a time and determining which variable has the most significant impact on the accuracy of the model next sequential FS add each of the remaining nine variable to the model once again determining which of these variables contributes the next goes to increasing the accuracy this process will stop when none of the remaining variables has a statistically significant impact on the accuracy of the model sequential FS outputs a vector named in model that identifies which features have a statistically significant impact as you can see sequential FS was able to correctly identify that only variables 1 4 5 and 10 have any real impact on the model variable 8 had a positive weight but the weight was so small that it didn't have a statistically significant impact finally I'm going to rerun my model using the four features that were recommended by sequential feature selection the deviance of the resulting model is identical my original model with ten features showing that the reduced feature set generates the same amount of accuracy one last note in passing this example ran quickly enough I didn't bother to use parallel computing if you have larger problems that involve very large numbers of predictors or alternatively if you're using modeling technique that require lots of CPU cycles to execute you'll get a really nice performance improvement if you turn parallel computing on next I'd like to show you a second feature selection technique called release F one of the weaknesses of sequential F F is that it considers each feature each predictor in isolation release F is able to detect conditional dependencies between features I'm going to start by loading a sample data set called ionosphere that ships with MATLAB X is a matrix which contains 351 observations with 34 independent variables Y is our response variable next I'm going to use release F to rank the various features now a couple quick points I want to discuss while the algorithm runs sequential feature selection is a function that we wrap around another function in the previous example we applied to control feature selection to GLM fit release F is a standalone feature selection algorithm which uses K nearest neighbors search under the hood the number 10 at the end of the release s command denotes the number of nearest neighbors that release F is going to use for its comparison the output from release F is a column of ranks and the column of weight here I can look at the output and see that featured number 24 is the most important feature followed by features number 3 and feature number 8 unlike sequential feature selection the release F algorithm doesn't explicitly recommend which of the features should be removed from the model in addition to functions for feature selection statistics toolbox also offers capabilities for feature transformation in fact we already worked with one earlier today when we use principal components analysis as the name suggests feature transformation works by transforming the data set over to a new coordinate system the two most popular feature transformation techniques are partial least squares regression and principal component regression unfortunately a one-hour webinar really doesn't give enough time to do justice to either partially squares regression or principal component regression what I can do is point you at a really stellar demo that we have available on the math works website that compares the two techniques the demo includes the sample data set as well as code that implements each of these two techniques this demo provides a first-rate introduction to feature transformation and it also includes some excellent applied examples using cross-validation this leads us to cross validation a few minutes back I've discussed problems that arise from overfitting if you build models under too complex and contain too many predictors you risk fitting both the trend in the noise in your data set cross-validation is a way to avoid overfitting at the most basic level cross-validation works by dividing your data set into a test set in a training set you fit your model using the training set you evaluate the accuracy of the model using the test set your trend should hold constant across the two data sets well the noise component is independent from one data set to another therefore this technique will help ensure that your model is accurately describing the trend rather than the noise statistics our box provides a pair of functions for cross-validation TV partition generates an object that throws the folks from your original data set or cross Val is used to perform the model evaluation all set of challenges that I want to talk about today has to do with constraint this is another example where the assumptions that underlie ordinary least-squares regression are valid however we're applying a set of additional restrictions on to the process some common examples where constraints get applied to regression problems include constraining the regression line to pass through a specific point constraining the regression coefficients to take a specific value or fall within a specific range forcing the regression line to have a positive slope or for that matter the third derivative to be greater than zero for applying budget or weight constraints some of the simpler versions of this problem can be handled by curve fitting toolbox it's easy enough to use good option to constrain the coefficients for your regression or force a positive slope more complicated problems are best handled using optimization toolbox optimization toolbox offers fine grained control over your objective function and you can use this to specify quite elaborate sets of constraints for our final demo I'd like to show you a real world example illustrating what noon is a common flow problem the example was taken from a paper titled modeling of tumor growth and anti-cancer effects of combination therapy by Koch at all the paper studied the impact of different drugs on the growth rate of tumor the equation at the top of the slide describes tumor growth as a function of time three of the key parameters that the model estimates are independent of the drugs that are being administered lambda0 governs the tumor growth rate during its initial exponential growth phase lambda1 governs a tumor growth lake during a linear growth phase and kill one describes the transition rate between compartments of non proliferating cells and in english this describes how quickly cells die the last parameter k2 is specific to the drug being administered and describes the potency of that drug we're going to use nonlinear regression to compare the tumor growth profile when two different drugs are being administered here's where things get interesting three the variables that we're working with our independent of drug that's being administered the coefficients for these variable should be identical regardless of whether we're administering drug a or drug B how can we constrain the regression such that both progression models output the same set of coefficients the answer is a function called LS concur fit that is included with optimization toolbox lsq kerf it stands for least squares curve fit it can be used for the same type of regression analysis we've been looking at after this webinar what distinguishes L is qku fit is the fine-grained control that you get over the optimization problem you get to write your an objective function and you can customize your function to handle complicated problems like this common slope analysis so let's jump into MATLAB and see how to solve this problem a moment ago I talked about an objective function the objective function is where we're going to build our constraint I'll start by specifying like three drug independent variables followed by drug dependent variables for drug a and drug B the data that we're using for our fitting problem is being generated by some simulation with some biology under the hood the next two lines of code set up a pair of simulations for our drugs notice the variables L 0 L 1 and K 1 appear in both simulations this is imposed in our common for constraint now let's move over to our analysis function and put this to work we start by importing our data next we specify some starting conditions for lsq curve fit we pass our data our objective function and our starting conditions to L Q curve fit and start the simulation finally we display our output which consists of a set of three common drug independent variables along with drug specific parameters for k2 I really love this example I think it does a great job illustrating how powerful the optimization functions are not only did we impose a common flip constraint we're also able to link the curve fitting to a pair of dynamic simulations now obviously this example is specific to biopharm however the same basic use case crops up frequently across a variety of disciplines so it's time to wrap up and then move on to some QA to start with I'd like to provide a quick summary of the techniques that we looked at today along with some of the MATLAB functions that can be used to solve these problems as well as the toolboxes where you can find the function as you can see most of the algorithms that we're looking at are located inside a statistic toolbox however curve fitting toolbox and optimization toolbox certainly have their place for solving these types of problems next the math works website has some great demos that provide much more in-depth treatments for some of the topics that we've discussed today I want to draw your attention to the following demo partial least squares regression and principal component regression as I already mentioned this demo provides a great practical example of feature transformation I can't recommend it highly enough selecting features for classifying high dimensional data is a great applied example for feature selection it's based on a classification example rather than regression but the techniques and the algorithms are all the same fitting data with generalized linear models as the name suggests this is a nice demo of GLM fit which uses a logistic regression example finally fitting an orthogonal regression using principal component analysis goes into a bit more detail regarding orthogonal regression equally significant all of the product documentation protective toolbox is available on the product page if you're trying to get a feel for the capabilities of the product this is another great place to start not least we have a great recorded webinar titled Tips & Tricks getting started with optimization using MATLAB this webinar is a great guide to our optimization capability so give me a minute to look over some of the questions that have come in and then we'll move on to Q&A
Info
Channel: MATLAB
Views: 33,016
Rating: 4.9118943 out of 5
Keywords: MATLAB, Simulink, MathWorks
Id: QuAiA1jaee0
Channel Id: undefined
Length: 38min 37sec (2317 seconds)
Published: Mon May 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.