Fit Distributions to Data in MATLAB

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey guys, in this video I'm going to go over how to use the Distribution Fitter app in MATLAB. This is a tool that MATLAB has to fit distribution functions like PDFs or CDFs, or even other options to data. If you have data that's--that you want to fit a distribution to, you can do that actually really easily in MATLAB. I'm going to do a little demo and the demo is really a walkthrough of this tutorial that's on the official MATLAB website. As always I'll put a link to this tutorial in the description box. MATLAB has a few, like, sample datasets that are automatically loaded in once you download MATLAB. One of them is called "carsmall" and has some information about 100 different cars and the variable that we're going to be looking at and fitting a distribution to is the miles per gallon. You want to load in your data first and then you click "Distribution Fitter". And this whole GUI, all the commands you can also run it from the command window if you want to but I think using the GUI is really simple. We're going to first load in our data and you can do that either clicking data, or if you click File > Import data it will open the same window. If you've already loaded your data into the workspace, then the data will show up if you click the drop-down menu. Like I said before the variable we're going to be working with is miles per gallon in this case. I'm not going to change any of these options. I'm going to click "Create dataset" and we have not fit any function yet to our data, so the first thing that will show up is really a histogram. It calls it density but it's really--as you can see, it's more of like the PMF rather than a PDF. And so we're going to fit two different functions to show you guys what a good fit to this distribution would look like. We're going to click "New fit" and then this will open up this window. You can name different--you can create a bunch of different fits and give them your own names. We've already loaded in the data that we want to use and then this drop-down window gives you all the different options for the different families of probability distributions that you can fit to your data. It has mostly parametric and then if you ever want to do like a kernel, which I'll show in this video at the end, you click "Non-parametric". If you know about nonparametrics, there's a lot that you can customize but it's also really easy to use just using the auto feature for example, and then it will fit the best data automatically. Before I do that as you can see here, the data is more or less bimodal, so it has kind of like a peak here and a peak here. Actually a normal distribution for this data won't be the best, but let's plot that first anyway. If I click "Normal" as a distribution or as the fit, then I'm going to click "Apply", and it will give you the statistic estimates in this window. I'm going to close that you can also save it but I'm going to close that without saving. And then here it plots now the PDF to the data. As you can see, that doesn't really capture all the information from the distribution. And you can also check that really easily because you can also plot probability plots. This is very similar to the idea of like a QQ plot. You're telling it here also what kind of distribution you want to compare the--you want to plot it against. This is our--all the little points are our data, our miles per gallon dataset, and we want this to fall on this red straight line. And if it--if the points did fall on the red line, then it would be a good sign that the data actually is normally distributed. That's the idea with a probability plot or a QQ plot, is that the data will have this distribution if the lines--if the dots fall on the straight line But as you can see here, there are some cases where especially at the tails, the distribution is--the data really is not normally distributed. Let's go back to the PDF, so I'm going to create another fit and you can manage what fits you want to show. You can overlay a bunch of different fits, which is nice. Or you can, if you click "Manage fit"--I'll do it at the end-- it will show you the option to just view one at a time. Now I'm going to show you what it looks like with a kernel. Again, I'm not going to even specify the bandwidth. I'm just gonna let it decide the bandwidth automatically and this will put a little dent in the distribution because it will look at the data. And I'm going to click "Apply". Again that will give me the fit statistics and then now the fit, as you can see, better captures the distribution of the sample data. And . . . Let me remove the first one . . . So this is the fit to our data. Now just looking at this new distribution, we see that this probability function that we fit to the data better captures the underlying sample distribution. And this has a lot of functionality. I think you can play around with it a lot, and it's also very easy to use. So yeah, that's it for this video, and thank you guys for watching!! *chiptune music plays*
Info
Channel: math et al
Views: 55,958
Rating: 4.8454938 out of 5
Keywords: fit distribution to data, fit distribution to data in matlab, distribution fitter, distributtion fitter matlab, probability distribution to data, fit probability distributions, matlab, matlab data fitter, mathetal, math et al
Id: aXy9mqQUKzQ
Channel Id: undefined
Length: 5min 36sec (336 seconds)
Published: Thu Feb 01 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.