Mod-01 Lec-01 Introduction to multivariate statistical modeling

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Good morning, welcome to the first lecture of applied multivariate statistical modeling. Let me tell you the content of this today’s presentation. So, we will start with introduction, then variables, data types, data sources, models and modeling followed by principles of modeling, statistical approaches to model building, multivariate models, some illustrative examples, three cases followed by references. The entire content will be covered in two hours. Today, I will try to finish up to principles of modeling, let us start with defining what is applied multivariate statistical modeling? Let us define whatever you want, first is applied. Now, what do you mean by applied in science, there is pure science and applied science. Pure science we generally understand which is basic science, which it basically talks about laws theories and their development, and their development, definitely it links with the phenomena, which we usually observe in different aspects of our life. Now, applied science which will use the knowledge of the pure science and develops something for the benefit of the mankind, so applied science one of the benefit we can say then when you talk about engineering, it is basically applied. Now, when I talk about applied statistics what do we mean? I am assuming that you have knowledge on preliminary basic statistics for example, normal distribution. If you know normal distribution then also you know the probability density function f x, which is 1 by root over 2 pi sigma square e to the power of minus of x minus mu by sigma square, where x varies from minus infinity to plus infinity. This is the so called this bell shaped curve which is developed by Carl Friedrich Gauss. So, theoretical development so that mean in development of this type of distributions it is coming under basics. Now, suppose if I want to apply this knowledge to a real life situation, I can find out a situation like this for example, let us there are three processes, process A B and C take certain inputs, convert into value added outputs, value added outputs all cases. Let there are basically three identical machines which is producing steel washers, steel washers will be shape like this where there is inner diameter ID. There is outer diameter OD as usual as there will be certain thickness of this washer so I can say T h. Now, if you produce a large amount of steel washers that means the number of items produced is large, n is large then the quality characteristic or the characteristics of the steel washer which is important to the people, the customer ID. If you plot you may get this type of distribution, which is normally distributed and where you will be getting mean here. There will be definitely standard deviation for ID. Similarly, for OD similarly, for thickness now then what you are doing by what is applied here? The production process A for example, in this case which is producing steel washers each is converted into a statistical process. In the sense in terms of a distribution like normal distribution, where we are saying that the production process can be interpreted, the behavior of this process can be interpreted like this now in order to get it further clarified. If we do like this suppose, this one is for a production process A and if I say this is for production process B and third one is this one for production process C, then using these things you will be able to compare. Compare A B and see their performance in terms of mean and standard deviation. There is possibility also to see that whether the mean ID produced by C is equal to that of B or A, this type of comparisons and things possible. So, when we actually when we develop something which will be useful to the society for the mankind, then we say it is applied. Now, come to the second word which is basically multivariate. Now, in order to understand multivariate we have to understand what is variable. I think it is known to you that variable is something which takes different values that since, I can say takes different values for example, if I say I D, x is I D inner diameter. Then if I produce one item, I stands for the item suppose, first item and the I D value it may take value X 1. When we go for second version it may take X 2. So, if I such way if I go for n washers produced, let X n will come into consideration. So, these are the values so I D takes different values as a result I D is a variable here. Now, in statistics we basically talk about two types of variables, one is fixed variable and the other one is deterministic, sorry random variable. So, fixed other way we can say deterministic and random we can say probabilistic for example, if I create another variable which is month it varies probably here but we know all the months. Suppose, what will be the next month is this month is your December next month will be January, it is known with certainty that is a deterministic model, but in this case when you are going to produce a second lot. Suppose, in the second lot even in one lot what is the value of I D for the second item, or second version it is not known with certainty, it is governed through probabilistic distribution. So, that sense that it is random one, we do not know the value exactly and this value is coming based on certain random experiment. In this case the process which is producing this item so if I go on saying like this then other variable here is O D. Similarly, other one is our thickness, now in order to accumulate more than one variable, we will write this X 1 is I D, X 2 is O D and X 3 is X 1, X 2 and X 3 is thickness the for the first of first item was produced. Then this will be x 11, second one x 2 1 and n one. Similarly, for O D x 1 2, x 2 2 like x n 2, and if I go for the X 3 variable that is observed for first observation, it is x 1 3, second one x 2 3. So, like this x n 3. So, what we are trying to say here that we are considering three variables X 1, X 2, X 3 which are nothing but the characteristics of the steel washers in this example which has inner diameter, which has outer diameter, which has thickness. Now, if you produce n number of washers then what will happen? Every washers will be having different values for I D, O D and thickness. So, this is my observation, first one is observation number 1, second one observation number 2, like that there is observation number n and you see in observation number 1 if I consider only I D that value is x 1 1, if I consider all three together, observation 1 takes value x 1 1, x 1 2, x 1 3. So, similarly if you go on increasing the number of variables up to X p then here it will be X 1 p, X 2 p like this X n p. Now, each of these as well as this, these are observations on multiple variables. What do you want to define here? We want to define here multivariate. So, in order to do so we know variable, deterministic variable, probabilistic, that is random variable and this is one example where every observation is measured on several variables. Then when multiple variables come into picture then each observation is a variable vector example, if I take the ith observation here then x i will be x i 1, x i 2 like this x i p. So, it is a variable vector that is ith observation on p variables. So, when we deal with this type of situation where our observations or each of the observations have multiple values in the sense values on multiple number of variables more than 1 then the situation is multivariate situation. Now, we define variable, we define multivariate situation, let us understand what is variate getting me? Instead of saying that x i is like this, if I create something different based on all those observations that there is linear combination of variables. For example, here in this in our example there are three variable X 1, X 2 and X 3. If I create a combination linear combination L C which is beta 1 X 1 plus beta 2 X 2 plus beta 3 X 3. So, this combiningly will give a quantity or a value or other way we can also say variable which is we are saying linear combination of variable which is variate, this is variate and then what is the definition of variate? Linear combination of variables with empirically written mean weights, that means beta 1, beta 2 and beta 3 will be determined based on observations. There are n observations so we will be able to determine all those variables. So, linear combination of or weighted linear combination of the variables where the weights are determined empirically that is variate. Now, in this case you can go for one variables, simple one variable that means if I say there are 3 variables, we are going variable p equal to 1 then that will be uni-variate, when we go for p equal to greater than equal to 2, that is multivariate. That is what multivariate usually in statistics books you will be finding univariate statistics. For example, in terms of normal univariate, normal distribution bivariate, normal distribution multivariate, normal distribution, so all the bivariate is a part of multivariate, we basically talk about when univariate means p equal to 1, bivariate means p equal to 2, multivariate is p greater than equal to 2. So, this is what is multivariate, by word multivariate we definitely talk something about linear combination of variables where more than one variable is there, and there are multiple observations, not a single observation, n number of observations and weights. We determined empirically based on the X observations n observations that will be collected from the population for which we want to infer something. All those inference we will discuss later. So, third one, the third issue is statistical. Now, what is statistical? By statistical we want to say that it is basically using statistics that is what we want to infer. So, whatever you are developing something using the statistical tools and taking it, then this development is statistical development. Now, what is statistics? If I say statistics is nothing but collecting, organizing, analyzing, then representing and interpreting. What I mean to say collecting data, organizing data, analyzing data, representing the results and interpreting the results for the population for which the statistical model, or the statistics is used for some purpose, some purposeful work will be served. So, when we talk about statistical that means we talk about the population, then a sample consist of data from the population and we have some purpose in our mind, objective in our mind. We want to infer something from of the population and we collect data accordingly we organize the data, we analyze the data, then we find the result and the result we summarize, and based on this summarization these findings we infer about the population so that is what is the word statistical is used. Now, last two are but very important one is the modeling, if you want to understand then first you understand this model. A model there are many types of model actually very simple one is in our school days. I can remember we talk about the spring balance like this, so what happened this is a spring, a elastic one, a load is attached with this is P and it behave in some way, that behavior if you increase the load, the elongation will be more. If you reduce it will be less. So, when this is the behavior, this is the spring balance model so to show the behavior of the spring this type so physical model are developed. So, this is one model which is basically a physical model, which is a physical model. Now, same thing when I came to my engineering studies, I found that there is one important concept called or development or theory called Hooke’s law, where that sigma he stress developed on the spring. And the elongation strain developed on it they are modeled in such a manner that there is a relationship like this. This is the range of elasticity, there is another concept called elasticity. So, what I have seen there or we all have seen there that sigma epsilon. So, sigma is E epsilon, where E is young modulus or modulus of elasticity. So, this is what is the theory behind the for elasticity, the area of elastic body when the load is so developed that each will not go to the yield point or beyond yield point, that is elastic zone. So, for so long the body is stressed within the within the elastic limit, what will happen to the property that if you remove the load then it will recover back to the original position. So, this development is possible because the physics of this particular spring was known and I can say if we, if I known the modulus of elasticity, I will be able to tell the relationship between sigma and epsilon. And that time in engineering mechanics and strength of materials subject we learn on these things, basic mathematical model. So, in reality you will get different types of mathematical model so that means, what I mean to say here that a physical model, a mathematical model. Now, what you mean by statistical model in this case for example, you take a case I think the inner beginning of this particular study for example, the how did we develop all these things. So, to experiment I have no idea but suppose you do not know the modulus of elasticity, but you know that say elastic body and you want to find the relationship in that case you can do experiment with P, varying P from P 1 to P n. So, that means you will create n different combinations then you will be getting 0 to n observations and sigma, epsilon values you will be getting sigma 1, sigma 2, sigma n; three epsilon, epsilon 1, epsilon 2 and then epsilon n. Now, if you plot this what will happen you may get a plot like this, here it is sigma essentially what is the difference between this and this here what I am saying, I am straight way without I when you showed you have shown me this spring balance. Then I immediately say that in elastic body this is the diagram, because the this Hooke’s law is known to me. So, mathematics is known to me but in case it is not known I have done several experiments here. And based on this I am trying, I will do plot like this need not the perfect straight line, you will get when you go for the empirical relationship. So, this is what is the empirical 1 model? So, this empirical model when we talk about empirical model like this experiment based or data based models like this, these are basically the statistics, these are all statistical. So, for me this is for all of us, this is our statistical model. Now, what is modeling? Then modeling is basically you want to get this type of results, it is not that a immediately you will get all this there is a process. The steps I have to understand what is my purpose? I have to understand in one or two full the purpose what are the variables that are affecting there. I have to identify all the important variables, then I have to see that how the data on the variable will be collected. For example, here I shown you the experiment but it may so happen that you cannot do the experiment. So, in that case is there any other way of collecting data for example, observation someone is interested to see the behavior of a particular animal. So, he cannot do the experiment may be but there are large number of wild animals of that particular species. So, we can observe that we are just going and observing field based so field based observation our this one is our experiment, sometimes what happened we will go for some naturalistic observations, naturalistic observations which we talk about the wild animal case field based observation. In the production we go suppose, the steel washer case go to the production shop, and see that what is happening there and collect data and accordingly you do some modeling, some naturalistic observations. So, all those type of data collection mechanism comes under empirical modeling and you have to understand all these things. So, this is a process, the process of modeling, the process of model building is called modeling, the process of model building is called of model building is modeling. So, let us see some of the slides now that I told that what is multivariate and then what is discussed, why should I use it and it is a base question and that was should I go for multivariate things? If I can do by some other way, why multivariate? So, they are some key issues which basically will be known to you later on that when we talk about multivariate. We talk about multiple variables that is p cross 1, if p the number of variables then X 1, X 2 like your X p. Now, there is possibility that these variables are interrelated, there is correlation, one of the easiest way is correlation in between the variables. So, that means you may be get a correlation matrix or other way it is basically the covariance between the variables or covariance. By covariance what I mean to say, if one variable varies there is a possibility that in particular way that some other variable also varies, then there will be covariance and standardized covariance is correlation. This is and in the subsequent lectures so covariance that will be p cross p matrix will come. So, all those things so similarly, the mean values for all those variables mu 1, mu 2 like mu p, this things will be there. Now, so my answer to your question is that why should I use it because no physical process or as such any other systems also, which is characterized by multiple variables. They should be analyzed other like their behavior should be analyzed by taking into consideration of all the variables characterizing it. When these variables consider very, very important for the design development or improvement of the system, for which it is developed. And as none of as it is obvious there will be covariance or correlation between the variables. If I go for univariate analysis we will lose substantially the information about the behavior, because of non-inclusion of the covariance structure. So, we require to control this covariance structure and in multivariate statistics covariance is a very big issue and which will be found in multivariate distribution. We will be discussing all this covariance things so it is required. For example, for this case like our this one steel washer, this case the steel washer, three variables are visibly controlling its quality, inner diameter, outer diameter and thickness. There is chance that inner and outer diameter will be related, also the thickness in that case the customer will not be able to apply it or fit it to its own situation if there is huge mismatch. Now, if we control inner diameter or outer diameter or thickness then what will happen? Then correlation structure will not be considered and ultimately we will not be able to satisfy the customer. So, we will be using multivariate statistics or multivariate modeling. When your system is complex in terms of number of variable it may be in conditions like this, the correlation structure is intact in order to extract a those correlation information, you want to extract the pattern from this data that is why you will be using. So, how do I do it? It is through the third models, so these models will be discussed a little later. Now, what is next? Next one example, here we are saying that a particular company operating may be in a city market and we want to see the organizational health of this company, with respect to profit in rupees million with respect to sales volume in rupees hundred, absenteeism, machine breakdown and M ratio. Actually, this is schemated intentionally first one is profit and sales volume, these are the organizational issue that health if you sell more your profit may be more. And if your profit is more you are healthy in financially, and another issue is absenteeism, if you are paying substantially and if you are taking care the well being of the employee’s absenteeism will be less.If you are maintaining the health of the process here we are saying machine, your machine breakdown will be less. And if you are if you are able to coordinate with customer as well as your supplier and your M ratio, that much M ratio particularly I say marketing ratio will relate to the customer and that will be high. So, if this is the case and then we are basically observing from April, May, June, July that 12 months data and in some units we have measured. This is nothing but a case of multivariate situation where each of the row like starting from 1 the first row, these values are talking about multivariate observations for month April. Similarly, for second these are multivariate observations so there are we have multivariate observations. Now, you may be may be interested to know how profit varies over the months, then it will be univariate one if you want to say that how sales volume vary over the month, it will be also univariate. Now, if you want to know absenteeism varies over the year over month that is also univariate like this. But if you are interested to see that how the profit and sales volume covary and they are own variation, then you will have to have to consider two variables. And then should be multivariate situation, sometimes you may be interested to know how the sales volume will be dependent on absenteeism and machine breakdown and marketing ratio. Then there must a dependent model and that is the same multivariate issue. So, this is in that shelf what I am talking about multivariate observations. So, now we have discussed some of the things, some of the variables and we have seen that we have assigned them some values, but how where from those values are coming? For example, if I say steel washer the thickness that mean be the inner or the outer thickness OS, how it is known? So, you have used some measurement scale to measure this, if I want to say that it may be you have used Vernier caliper to measure the outer diameter, may be used Vernier caliper to measure the inner diameter. So, you have used some instrument and as well as you have there is scale of measurement. In this case the scale is basically length which may be in terms of millimeter. So, you have to sue some scale of measurement and based on the scale used whatever data you get those data will be of different types. So, you see this line here, the left side we are talking about random variables and right hand side we are talking about data types. I have explained you this random variable earlier, so I will not spend much time here, but you must please understand one thing that in random variable there will be discrete and continuous random variable. By discrete random variable we mean to say that they will take some counted account values like 0, 1, 2 or something like this or January, February, March something like this, and your continuous case that profit absenteeism breakdown or what is M ratio here? What is that any value is possible? So, please understand one thing here, since volume are coming under your discrete because it is countable one but many countable, such count values can also be considered as continuous in any situations. But any how so there are two types. Now, your data types I told you that what measurement scale you are using. Based on these data types you will be known, means that data will be having certain properties because data is nothing but information. How much information is available with the data getting me, so did it all depends on what scale you have used to measure this data. So, based on that there are four types of data, one is nominal data, ordinal data, interval data and ratio data. Let us discuss something about nominal data. My definition is this provide identity to some items or things is I say the month, the company, small company that is the I should have shown you that they want to do over the different months, what is the status. So, the month is a variable starting from January to December because it changes. So, then it is January and February all those things nothing but they are the identity of the period of time identity of the particular series. Suppose, you just think of you are trying to know that some performance or status of the different department of a for example, IIT so then if I say the department of chemistry, department of physics, department of mathematics, department of computer science, department of industrial engineering and management. So, all those things and they are basically providing identity but we sometimes require this type of data in our to include in our analysis. So, this is nothing but nominal data. Now, what is the problem with nominal data? Problem with nominal data is that there is huge computational limitations, because you cannot do any arithmetic limitations, you cannot add department of chemistry plus department of physics like this. We cannot say department of chemistry is 1 and department of physics is 2 and accordingly we will add, we cannot subtract, we cannot multiply, we cannot make division also this is the problem. Next data type is your ordinal data type. What is ordinal data type? Suppose, you just see that you have you have travelled in flight several times may be, or train or some other places or you have gone to the students, and when you have taken food and you might have seen that you are giving a feedback form. They are seeing that they are pleased, they have read the in case of hotel food quality, service quality, room quality all those things in terms of not satisfied. We are totally unsatisfied to extremely satisfied, this type of scale we have used for example, for the food case it is taste wise this very good, good or something like this. So, this type of ordering when order thing is there this is called ordinal data. So, what it does provide some order or rank to some items or things examples, service quality, it is low medium or good and computational limitations. Here also we cannot do any arithmetic operations like your addition, subtraction, multiplication and division. You cannot do then what way it is better than our nominal data? It is better than nominal data because here you are getting a order, a rank you are getting. If I say the performance that my student performance is low, average and very good excellent like this, the person who is getting excellent is definitely better than the person or the student who got very good. So, I have a ranking skill here ranking ability with this data. So, ordinal data is rich compared to nominal data. Next data type I said that interval data, what is interval data? It is basically well understood if we take this example, here temperature we are measuring using two scales, one is celsius, another only Fahrenheit. In developing these two scales, Fahrenheit scale as well as your Celsius scale, the reference point is taken at two different points, means locations. It is not the same you getting me so and if you see the horizontal lines here you see that 0 degree centigrade, 20 degree centigrade and 100 degree centigrade. Then the corresponding Fahrenheit will be 32, 70 and 212 Fahrenheit, understanding? So, there is a range that if I say the difference from 100 to 0 degree you are getting this range, here are also 212 to 32 the corrseponding range is this. So, whether we measure in using celsius scale or Fahrenheit scales we will be getting the equal range. Now, what will happen suppose, I measured temperature today? Today day temperature is 20 degree centigrade to 30 degree and may be day after tomorrow 21 degree, then if I want to do the averaging I can add them and then divided by 3, that 3 days average I will get if I do the same thing in Fahrenheit. Also it is possible I can do that similar thing, I can do but what will happen? Suppose, I want to say that what is the how many times temperature of today is compared to the tomorrows, yesterday’s temperature. Then if I use Celsius scale and if I divide 22 by 20 and then here it will be it may be 70 and other things, then we will find out they are not matching. So, that means interval scale is some scale where you will get a interval data range data and they are all having al, type of continuous properties except and they can do 3 arithmetic operations very easily, addition, subtraction and multiplication. But when you do division, you will find out that when you change, it changes the scale. Ultimately what will happen? You will find that they, so in interval data you cannot go for division. Interval data division is not possible, all other arithmetic operations are possible. Let us go to the next slide. Our slide that is we are talking about ratio, data ratio. Data is something where there is absolute 0 in the scale of measurement. This is 0, if I move towards right suppose x amount and towards left also x amount then the difference, this difference is same. If I go for y also, this side also y also that is the same. So, that means if you go in to the left it is that is the same. So, that means if you go in the to the left it is negative, this side it is positive, but there is absolute 0 in between. So, this 0 is the reference point not in terms of the Fahrenheit and centigrade scale that where is the two different definition, it contains absolute 0, highest form of data, sorry. So, ratio data is it contains absolute 0 highest form of data example absenteeism breakdown hours as shown earlier and computational, all arithmetic operations are possible here. Now, if I go by the order of information available then definitely your first one is if it is nominal then followed by ordinal, then your interval, then your ratio. Then definitely in order of increasing information this will the, this is the case my best data is this, next best is this, next best is this, next and this is the lowest of information data. So, you know that different data types. Now, you know that as you will be applying multivariate statistical modeling, you must require full-fledged data. So, you need to know the data source, primary data collected from the source where it is generated for example, in the case of a steel washer example, if you collect data from the production shop and just going there collecting data or that is what is known as primary data. Suppose, you want to see the behavior of the animals in the jungle go and observe and then accordingly note down and that will be your primary data. So, for the production that and that example the profit and sales volume case that is also primary data. So, long you are collecting from the source, what is secondary data? Secondary data stored in repository or collected by someone else, you are getting me? So, you are not collecting, it is already there. We have different sources for example, you may get the financial data from some sources. And suppose company is maintaining records of their production and suppose their maintenance or the health of machines and many things. So, you have not collected so company has stored and you have gone there and collected these things, or it is better that in a literature you studying something in your own area. You found that a paper is there where some data is given. So, this type of data is secondary but secondary data must have must be authentic, in the sense that reference of the data is available, author references are there, this is that author literature data but this is definitely as it is done by somebody else. It is not primary, there you have to rely on the authenticity of the data collected by somebody else. The tertiary data which is basically a common knowledge type of things. Suppose, you know you will find many things are there actually when in terms of modeling, modeling when you start with a subject area you start with this that when your knowledge is not very clear, you will start with the tertiary sources. And then slowly you go to the secondary source. Finally, when you do actual work you may go for the primary data sources. I told you earlier for this model, let me repeat this again that model mimics reality that when you develop a model that without considering the reality, the real thing you are not doing the justice. So, the model reality so it should be a, it should have real applications that is what is the meaning. For example, suppose you think of a car which is got with by suppose any they develop, they develop these things. What I mean to say they develop a model simulation model in computer first before going for a developing the car, one after the other manufacturing. The car in the manufacturing shop or there must be some simulation model and means how the car will work. So, that type of things are known as that means it is a in terms of the reality the car is the real thing. So, your modeling can be so that is simplest example and that the mathematics is related to the elastic behavior of it that is the reality. In statistical sense when we talk about the how sales volume is dependent on other things that is your absenteeism, M ratio and all those things that also is going to talk about the ways, which show actually in statistical sense, a model talks about explain the regularity of a phenomena. In Hooke’s law the regularity is so long, it is within the elastic limit. When the load is released, it will come back to the original shape that is the regularity. In case of our statistical model building we talk about data and data is nothing but equal to this is pattern plus error, this pattern is the regularity pattern or systematic component. So, we must know what is our problem? And accordingly all data if you collect it and you want to extract pattern from this data. In case of prediction model suppose you want to predict some y value then with respect to some x values. And then you will find out there is some linear combination variable that is X beta, then plus l will be there. This is my regularity or my data. So, when you repeat similar that similar development under different situations then what will happen? Then if it performs well under the different situation for which it is developed, the one day we may say it is a law or a theory like Hooke’s law or Hooke’s or this Hooke’s law is this, I left there that elasticity thing. So, we all know that Newton’s laws of motion and we all know that Dalton’s atomic theory and many other things that these are not one day everything is developed and people accept it. It basically developed at test stage verified, validated after several years and then other scientist other that is the researcher, they accepted the fact and then it was applied to different situations and found that it is working. I told you modeling, also process of building a process, physical, mathematical and statistical, this is I have already explained to you. I hope that you got the glimpse of actually the purpose of applied multivariate statistical modeling. Actually, we want to develop empirical model, those empirical models is these are all data based, data based in the sense that they contain you have data. And you are going for building models and you are building models, and to find out the regularity of the data, or the pattern of the data. And show that you will be able to describe the relationships of the population or the behavior of the population or system in consideration. You will be able to establish the strength of the relationship, you will be able to predict something, you may be able to prescribe something also, but when you talk about a statistical modeling. Usually this is the description and prediction part is description explanation and prediction this three things come into consideration. So, slowly you will be knowing different types of statistical all together and you will be tempted to develop different models, also based on the data whatever available to you but before model, going for modeling or applying any statistical techniques what is happening? What is we want to say that you have to have some principles in your mind before going for this here. I have just jotted down some of the principles which I have taken from text book by operation research by see what is they said that do not build a complicated model when a simple one will suffice. For example, suppose if I know the mean value of the different lots of steel over mid value of a particular characteristics; for example, the inner diameter of a different your lots produced. And if that suffice my purpose go for mean, or at max you may require the standard deviation of the inner diameter produced by the different processes A B C as I told you. So, there you may you do not go for may be that covariance structure, many other thing. So, you do not go for if it is needed you go for you are modeling of the problem to fit the technique, many a time I have seen it my case that there is one model which we will be discussing later on known as structural equation modeling. The people are using structural model everywhere where a simple regression model can be. But people are interested to fit the structural equation model. So, please be little bit of cautious on those things, that model is for problem solving and model comes from the problem, not to fit a statistical technique. Design phase of modeling must be conducted rigorously and it will discussed later. What do we mean by design phase? Coming under study design model should be verified prior to validation. Verification means suppose you when you collect data you split the data into two halves, one for your training other for test. What other way I can say? One set for model building, other set for verification and validation basically talks about when you take some new data again and you find it is working, that is validation. A model should never take in too literally but many a times what I have found that model there are more many variables, statistics is taken very in very loose end. So, if there are many variables let us find the relationship is there or not this type of or whatever variable is there. Let us find that relationship without considering the purpose. A model should neither be pressed to do nor criticized for failing to do that for which it was never intended for example, you are interested to see the relationship between variable of a particular population. Now, later on you want to see that how I want to predict something, see you developed a model to see the pattern strength of relationship not to predict. So, why how can your model will predict which was not intended for, so that is another issue. So, if it fails to do prediction when it was just to understand the covariance structure, then we should not criticize for this nor we should not press the model to do it, beware of overselling a model many a times. We basically make sure of I can say recommendation based on the model and many of the things basically from common sense, and so that type of selling I prohibited some of primary benefits of modeling are associated with the process of developing the model. So, the see as all we of you are busy in learning multivariate statistics, multivariate modeling. So, do not think that always you will be doing something great with these type of modeling you are learning. So, the learning process when you develop something you know the physics of the problem, may be you know the process through which data is generated, you know how the data to be captured, how the data to be analyzed, what techniques is applicable. So, this is a entire gamut, so this gamut of process is very, very important. So, very many fits very, very fits you acquire out of it a model cannot be any better than information that goes into it. So, you cannot say that you are using nominal data and you will be basically talking about a model of regression where y variable is nominal. So, you have to have go for some other type of model for that may be your regression. So, the information what you are the quality of information what is fed into that model, that is more important because it if input is not good then output also, you should not expect good. So, model cannot replace decision maker, getting me? You cannot think that you are your model is superior than you the decision maker, the analyst who has having the system knowledge they will act smart. So, they are more important people, so whatever you develop, whatever you do for, what purpose you are developing, all these things. So, that is in your brain, in it is the root what work there so better than any model. So, in this case what I want to say that you please take all those issues what I have discussed, the principles particularly in this series and accordingly develop the model and today it is up to these. Next class, we will be studying the statistical approach to problem. Thank you for your patience.
Info
Channel: nptelhrd
Views: 93,019
Rating: undefined out of 5
Keywords: Introduction to multivariate statistical modeling
Id: eNQTnPXNIFA
Channel Id: undefined
Length: 62min 54sec (3774 seconds)
Published: Fri May 09 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.