What The Heck is Stationarity

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to what the heck of stationarity this is the first of a set of video tutorials on the subject of computing directional sample variance now you're probably wondering what the heck does stationarity have to do with computing sample variance well the answer is that many data sets are initially not suitable for Vera Graham calculations and in that case the calculated sample very agreeable and so if you know a bit about models and their stationarity properties you'll be in a better position to work with problem data sets and extract meaningful sample variance the problem in general then is to infer population statistics from a sample in very ography we are typically given a sample of spatial data generally from some earthy population and we wish to infer the patterns of spatial continuity of the earthy populations an order for us to be able to infer the population statistics from our sample we're going to have to rely on the use of models of one kind or another and as you will see in the following slides models are absolutely necessary so let me show you a few examples that illustrate the necessity of modeling suppose you're given the following sample data from some unknown population and you wish to complete the profile between the green sample data points now suppose you're told that these data points were taken from the peak heights of a trajectory of a bouncing soccer ball and given this model you could complete the profile of bouncing soccer ball as follows well alternatively let's say you're told that the green sample data points are distance and velocity measurements of a decelerating or slowing automobile and these measurements were taken at regular time intervals given this model you could complete the profile this way velocity is plotted on the y axis while distance traveled is plotted on the x axis note that the velocity and distance traveled between data points decreases as the automobile is slowing down because the data was taken at regular time intervals here is another example showing the necessity of models suppose we are given the east-west and north-south very gamma ranges from some particular data set and we plot those ranges as shown here note the east-west range or the east range I should say is much longer than the north range and this is a common pattern known as an isotropic spatial continuity and from this phrase we get the term and isotropic or anisotropic now suppose in these two directions we wish to map the Varia gram ranges in all possible directions we can begin to complete the map of ranges in all directions by remembering that the South Range is simply the reverse of the north range and similarly the west range is simply the reverse of the east range however in order to map the ranges in other directions we're going to have to resort to using a model of some kind or another for example this is a rectangular model of the Varia gram ranges in all directions it is a simple model and easy to work with for example we can map the northeast area gram range or the south southwest very gram range or the west west north vary gram range with this model in fact we can map the via gram range for any direction using this model however the model appears to provide some unexpected results for example the west west north range that's this one here appears to be or actually is longer than the West range so that is kind of unexpected so let's try another model here is the diamond model and again we see that the diamond model provides us with very aground in any direction the model is a bit better than the rectangular model in that at least there are no model ranges longer than the East or West data ranges however we note that the model ranges in directions just a few degrees off of the west direction here is significantly shorter than the East West data range here so that may be something that we can't actually see when we compute real sample via gram data's in ranges that are just off of West for example our third model is an ellipse in two dimensions and in the lip side in three dimensions you should note that all three models that is the earlier rectangular model the triangle model and the ellipse model can easily provide models of very gram ranges in three dimensions of course most of you will recognize the ellipse and ellipses the model that is used in geo statistics you might ask why is the ellipse the chosen model well the answer is because the ellipse ellipsoid is known to fit range is calculated in various directions from many different data sets over time in other words the ellipse appears to be a model that satisfactorily fits calculated ranges from real-world data over time hopefully by now you should be convinced that models are necessary if we wish to infer population statistics next we will look at two different classes of models and these two classes are deterministic and probabilistic models I'd like to begin by looking at deterministic models and when talking about models I like to draw a line down the center of the page and then I'll label the left side of the page deterministic models and on the right side the real world I find that this helps clarify whether the statistics are property under discussion or that we're talking about when in a group for example pertains to the real world or to a model of the real world sometimes in these groups it's just not clear exactly which one folks are referring to well let's continue by giving ourselves a real-world example and in this case a chicken-egg so this is a chicken-egg from the real world my first deterministic model of the chicken-egg is a simple circle x squared plus y squared equals 1 the fit is not so good and so you might ask can we do better well how about an ellipse the ellipse is a better fit but we can still do better by modifying the equation of an ellipse to give us something that looks pretty much like an egg these are all examples of deterministic models unfortunately however deterministic models generally don't play much of a role in geo statistics in fact it's the probabilistic models and random variables in particular that are the cornerstone of geostatistics and so the next few slides are all about random variables just in case they are something you're not too familiar with so you can ask what is a random variable remember your basic algebra where you had variables such as x and y and we're each represented a single numerical value well random variables are similar except that each random variable represents a distribution of possible values it's called random because it's specific outcome or you can be any value from its distribution of possible values according to some probability law let me illustrate this with this slide here this slide illustrates random variable models of sample data for example suppose we have a blue sample data point at a specific location and here blue represents the actual sample value at that location we can model the blue sample value as an outcome of a random variable although in general a random variable consists of a number of possible values at this specific location we say the outcome of the random variable is a single value which is blue and that corresponds to the sample value a second sample is located more or less in the center of the sample domain yielded a red sample value and so similarly our model of this sample is a red outcome and we have a third example here and it turned out to be green again our model of the green sample is a green outcome of a random variable now the random variable model can also be applied to any or all unsampled locations for example the uncoloured circle indicates a location which has not been sampled yet the random variable model at this location is simply the histogram of possible without an outcome in other words the actual but unknown sample value could be any one of the random variable values in that particular model here's a second example of an unsampled location and it's a random variable model let's go back to the model versus real-world format and on the right we have the real world as represented by a stack of maps showing various colored spatial sample data set from maps of the USA in geostatistics we're generally concerned and with inferring population averages variability and spatial continuity from sample data and to do this we make use of probabilistic models namely the random variables random variables are a function of their location it's a function of the location X Y Z sometimes we get a bit lazy and rather than writing out the full random variable as a function of XY and Z we simply write it out as a function of X understanding that X represents all three coordinates and then we even get lazier and just write V and understanding that it's a function of XY and Z since the random variable consists of a distribution or histogram of all possible values we can calculate the mean the variance and the semi very gram value given by these equations where lowercase V are outcomes of the random variable there are three properties that random variables must have to enable statistical inference of populations statistics 1 the mean must be stationary 2 the variance must be stationary and 3 the variance of the differences between random variable pairs must also be stationary for all H what do we mean by the word or the terms of stationary well here it is the official definition of stationarity the local average of the random variable does not depend on location that is there is no spatial trend in the random variable outcomes to the local variance of the random variable also does not depend on location that is there is no trend in the local variance of random variable outcomes and three for all vectors H the variance of the differences that is V of X minus V of X plus h does not depend on location that is there is no trend in the local variance of differences I put this slide up here to illustrate the stationary concepts note that the distribution or histogram of all of these random variables has the same shape and thus they have the same mean and the same variance and although it is not apparent the variances of differences between pairs is also constant and thus these variables shown here are said to be stationary and if some of the variables here have different means or variances they actually would be useless for inferring any population statistic and that would include spatial continuity or the Varia gram well next I'm going to show you an example of a stationary process and if you ask what is a stationary process well a stationary process is simply a series of random variable outcomes for example each white dot in this zoom window here is the outcome of a random variable and I have simply joined the outcomes with a green line to sort of complete the graph this is the same stationary process shown in the previous slide and with the addition of its statistics we are interested in the mean which is one point six seven here the variance which is point six six zero point six six here and the sample vary agree on here on the right we note that there are 1000 outcomes or sample data points in this profile so this is a standardized sample very gram and I will discuss standardized sample very grams in the next video okay because this is a stationary process we can look at subsets of the process and compare their statistics to one another and also to the global process so for example here we have divided the process into two subsets shown by the blue and yellow colors and then we compute the statistics of each subset so we have the mean 1.63 there are only 500 observations the variance is 0.62 and the sample very aground is shown here next we do the same thing for the yellow subset we note that its mean is 1.7 due there are 500 samples and the variance is 0.7 zero and this is the sample very ground next let's compare the statistics of the subsets to the global statistics all in one slide so here we are the Green is the global statistics note there are 1000 observations mean a 1.67 variance point 6 6 and we compare that one point six three point six one one point seven two point seven zero and here are all three very grams plotted on the same figure so you can see from this that process statistics or subsets and globally compare quite well that includes mean variance and sample area gram well what about a non-stationary example we can interpret this slide as follows we're looking at a graph of sample values which can also be seen as a series of outcomes from two different but stationary random variables as indicated or shown by the blue and yellow graphs for example the mean of the blue graph was obviously less than the mean of the yellow graph similarly the variance of the blue graph is also less than the variance of the yellow graph obviously the global mean and variance computed from the combined colors would not be representative of the global statistics the global sample very aground doesn't appear to reach klepto and certainly is not a good model of the spatial continuity of either the blue or yellow profiles as shown by the blue and yellow sample of area grams on the right here is a second example showing a data profile with a trend in the mean as well as a trend in the variance note that as the local mean increases so does the local variance this is a very common pattern seen with earth science data particularly in mining where base metal assay samples almost always show similar trends in fact an increasing variance with increasing mean is known as the proportional effect note the parabolic behavior at in at an accelerating rate as the lag distance increases and note that it never appears to reach a plateau or sill this sample very grab pattern is well-known amongst geo statisticians and is convincing evidence of a trend in your data such a sample of your diagram is useless for all subsequent geo statistical calculation so let's conclude this video then with a couple of points in order to calculate meaningful sample via grams from your sample data the local means and local variances must not contain a global trend and to any mixture of statistical populations with unique population means or variances must be dealt with separately I will show how to deal with trends in a subsequent video called working with difficult data that's it for this video thank you
Info
Channel: Edward Isaaks
Views: 17,674
Rating: 5 out of 5
Keywords: Stationary Process, Random variables, Proportional Effect, Geostatistics
Id: K3eUIDu4NWw
Channel Id: undefined
Length: 19min 32sec (1172 seconds)
Published: Tue Apr 21 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.