Mod-01 Lec-01 Introduction and Motivation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
friends in this course we will cover important topics of statistical inference this is a second major course in the subject of probability and statistics we have one basic course on probability theory where we talk about concepts of probability and distributions and in this course we broadly discuss the methods of statistics as applied to day-to-day problems we will be mainly covering in this course point estimation in the point estimation we will cover the fundamental concepts such as unbiasness consistency efficiency and then we will discuss the methods of finding out the estimators such as method of moments maximum likelihood estimator estimators then we will discuss the uniformly minimum variance unbiased estimation the method of lower bounds for determining these method of lower bounds and another concept is that of through sufficiency and completeness we will cover another major area and inference that is testing of hypothesis we will discuss how to find out various kind of tests what are the types of errors the fundamental concepts of testing of hypothesis and determination of the tests so in the determination of tests we will discuss most powerful uniformly most powerful tests and then a related concept that is of likelihood ratio tests we will also discuss the problem of interval estimation in this we will discuss the methods of finding out confidence intervals for useful one sample and to sample normal distribution problems the important textbooks that can be used for this course are an introduction to probability and statistics by VK rohad key and a k md e saleh another important book is statistical inference by G Casella and RL burger a first course on parametric inference by BK call a modern mathematical statistics by ej Ludwick's and SN mushara introduction to the theory of statistics by a m-mode f a grayble and DC voice those who are interested to get advanced knowledge on statistical inference may further look at the books theory of point estimation by al layman and G Casella and testing statistical hypothesis by Al lemon and Romano these books cover almost all the topics that will be taught in this particular course that I am going to start today so let me firstly introduce what is the problem of statistical inference and why should we study it so let me talk about the introduction and motivation of statistical inference we notice that many phenomena in various Sciences they are governed by laws which are not deterministic in nature so we can call them stochastic or probabilistic in nature for example if you look at the amount of rainfall in various parts of the country during a monsoon season then it is not sure that how much rainfall is going to be there in the next year how much it will differ from the previous year whether over the entire geographical region in which we are interested in whether there will be a uniform distribution of the rainfall or in some portions there will be tremendous rainfall and in some other places there will be a condition of drought no matter what physical Theory we develop or what atmospheric scientists are able to develop the theory they can never be sure of the exact amount the timing of the rainfall etcetera in different geographical regions so we can say that these are governed by probabilistic laws similarly suppose I consider the time taken by patients to get cured by a disease while undergoing a particular treatment so quite often we observe there are patients who are given a certain treatment for a certain disease we observe that a patient a gets cured within two days whereas patient B takes ten days to get cured and there may be a patient's AC who may not get cured by that particular medicine and he may have to be given another type of medicine so the effect of the medicine on different patients are quite subjective in nature they depend upon various conditions therefore these are stochastic in nature similar examples are the number of persons in a service queue at a ticket counter or at a petrol pump so for example if we go to a railway counter at a given time of a day we find that at that time there are large number of people standing in a queue so for the next time of the when we need the booking we go to the counter at another time thinking that at this time there will be less number of people and we find that that is not true at another time when we go we find that the number of customers are the number of persons standing in the queue is much less so the number of persons this the need of the person's to book in the different trains etcetera is also non deterministic in nature the number of children in each family so these kind of problems when we look at this kind of phenomena they are all stochastic in nature so these are nicely modeled using the probability distributions so the application of probability models to these kinds of experiments this has led to the development of methods which are commonly called the methods of statistical inference some typical problems we are one uses statistical methods to arrive at a solution or deciding on the basis of sample inspection of lots in a manufacturing process whether the quality of product is satisfactory so you consider a factory where certain kind of nuts and bolts are being produced now we are in the factory owner or the manager of the production he is interested to know the quality of the production if the quality is alright it will be sent to the market at a appropriate price on the other hand if the quality of the product is not good then it will not be sold at high price on the other hand it may even be returned so the quality of the production is very important so now how do we go is about it he looks at the batches of the product for example maybe in a thousand or in a hundred and he inspects randomly say ten out of each lot of 100 and he makes a decision based on that whether the there are more defectives or less number of defectives for example he observes that every lot of ten it does not produce more than one defective product then he may conclude that the production of the bolts is satisfactory or up to the given marks another problem of inference could be that on the basis of a random sample of persons who are in certain employment we want to estimate the average salary or wages in a country so this kind of situation of this kind of inference is important for determining economic policies of the government it is important for the say consumer goods producing companies they if they find that the average salaries are high or on the higher side in particular place then they may like to sell the appropriate items to that zone because they may get more customers on the other hand if they find that the average salaries are much lower then high-cost goods may not be able to be sold in the market at those places another problem could be that for example nowadays there is a lot of talk about climate change global warming etcetera so the scientists are interested in knowing how much increase of the temperature will be there in the global climate during next say 10 years or next 5 years on the basis of the data which is available to us from the past 50 years or from past 100 years so on the basis of this we will be able to estimate how much will be the average temperatures in different places our overall global temperature this will this is going to be useful to determine the policies of the various organizations for various governments that what should they do to reduce the global warming or the effect of the global warming another typical example of statistical inference is deciding on the basis of clinical trials whether or not a new drug is to be approved for human use for a particular disease so there is a disease for which certain drug may be or may not be available now doctors are the persons who are and who are involved in the development of the medicine they come up with certain substance certain biochemical substance which they find to be effective against the disease-causing bacteria or virus now how do they go about it so they design an experiment where the a certain medicine is produced using that using certain amounts of that clinical and that biochemical substance and now once it is decided and the amount is decided or amount is determined that this is going to be actually useful in curing this disease then it's the question of introducing in the market now the medicine can be introduced in the market only if it is found that after taking this medicine there are no side effects as well as it has a high efficacy as compared to the previously used medicine or if there was no medicine then it should be better than the control that means even if the medicines are not taken if some people are getting cured then this medicine should be better than that so it is the job of the statistician on the basis of a random sample he will decide how to take a decision in arriving at a conclusion whether this new medicine is going to be effective or not estimating infant mortality rate in a state or a country based on a random sample from that region so people talk about the development so we say that a country has a high GDP or gross domestic product and therefore the countries on the path of a growth but whether from the Human Development Index point of view whether there is an overall development so we look at other parameters such as infant mortality rate the literacy levels and other factors so we want to notice find out what is the infant mortality rate in that country if the infant mortality rate despite having high GDP are hi Everett celery's even if it is even than if it is higher than it that means it relates to certain other kind of traditions certain other kind of conditions which may exist there which are not which despite high GDP growth are not going to improve the infant mortality rate etcetera so a social scientist and the planners of the country are interested in knowing the what is the estimation or what is the estimate of the infant mortality rate a frequently encountered problem is that there are various kind of guns which can be used in the wider army now now army wants to buy new guns to replenish its it's the stock of the arms so now various arms manufacturers they give the samples from their factories and then the field trials of those guns are conducted so the job of the statistician is to determine that which is the best gun among this which is the second best etcetera in terms of its performance performance could be in terms of accuracy that are the range that the gun can cover in hitting the its targets its robustness in the sense that depending upon the different terrain whether it is hilly region or whether it is a whether it is a plain region or whether it is a desert region whether the gun can be equally effective in different temperatures in different timings of the day etcetera and also the longevity of the gun that is also important so the job of the one needs statistical methods to determine which is the best which is the second based and so on that means we have to order them in some preference so these kind of problems are used formulated using a mathematical model or you can say statistical model so on the hand you can say the field person gives some data to the statistician so the data is in the form of certain numerical values are some observed values so the statistician treats these values as the values of a random variable X so we use a notation say capital X which denotes the random variable whose observed values are given to the statistician by the user by the end user so they call it the values as X 1 X 2 and X n as an example we looked at the problems so we are looking at say sample inspection of lots in the example of sample inspection of lots what will be X 1 X 2 X n so suppose 10 Lots are inspected of say 10 products each that means each lot of the bowls contains 10 volts and we are looking at in terms of say the diameter of the top are the length of the volt etcetera anything which we which is the quality control terminology used for describing the goodness of that product so we may use that now the data could be in the form of say 0 1 2 1 1 0 0 1 0 0 so these ten values that means the Lord number one it had no defective lot number one had one defective lot number two had two defective lot number four had one defective lot number five had one defective lot number six and seven had no defective lot number eight had one defective lot number nine and ten had again no defective bolts according to the criteria that has been fixed by the Quality Control Manager for that particular product so for statisticians these values will represent the values of X 1 X 2 X 3 up to X 10 so the random variable X denotes here the number of defectives in a lot of 10 so in a lot of 10 there are X number of defectives in fact in this particular case one can find out the corresponding probability distribution for example if I say that there are X number of defectives in a lot of 10 so one may use a binomial model or one may use a hyper geometric model depending upon the conditions that have been imposed on this kind of situation suppose there is a constant probability P of being defective then this probability will be 10 C X P to the power X 1 minus P to the power 10 minus X for X is equal to 0 1 to 10 that means you may have a binomial model to describe the distribution of X let us consider another problem suppose we are looking at the random sample of working persons and we are looking at estimating average monthly salary in the country so in this particular case if we are looking at estimation of monthly salary of the persons say implied persons so the values here would be be calculated as X 1 X to say X 20 suppose 20 persons have been considered here then the values could be in terms of rupees say so you may have say for a person it could be 1200 rupees for another person it could be maybe 20000 rupees for third person it could be say 1700 rupees for one person it could be 800 rupees etcetera so here X 1 X 2 X 20 denote these values and now we may use this data to have a model so for example the model may be given by certain distribution say f it could be normal distribution it could be gamma distribution etcetera depending upon the actual values that have been obtained there so so the probability distribution of X is modeled using standard techniques of fitting of the distribution and therefore we can say that X has a distribution say FX theta which of course may be discrete or it could be continuous or it could be mixed and theta denotes some characteristic of the population which could be scalar or the vector here X itself can be scalar or a vector depending upon the type of the things we are having for example the situations that have been described now here in all these cases X is a discrete random variable however there may be some other situations for example I am looking at X as the the observations are taken on a person regarding his health he goes to a medical practitioner and he wants to have an estimate of his average health so the values that X may take here may consist of certain components say X 1 which may relate to his age X 2 may relate to his weight that is his body weight X 3 may denote his blood pressure now blood pressure may consist of two values so you have two values here so you may consider them as X 3 and X 4 then you may have his sugar level and so on say his pulse level in this case X is a random vector similarly the parameter of the distribution F for the X so the parameter theta itself may be a vector or it may be a scalar in the case of monthly salary if we are having a distribution such as a normal distribution then the normal distribution is characterized by two parameters say mu and Sigma square in this case my theta is consisting of two parts if we are considering say the number of persons arriving in a queue in a given time period then we may model it by using say a Poisson distribution which is having a parameter lambda which is the rate of arrival here so for different problems we will use different probability models to describe the setup so we are having X 1 X 2 xn as the random sample which we treat as the observed values of a random variable X and X will is assumed to have a known distribution FX theta and therefore the distribution consists of certain parameter which is called parameter istic or parameter of the distribution so the distribution of X is completely specified then the value of theta is known now in standard situations although we may arrive or determine at a form of F but in most of the practical situations it is not possible to fix the value of theta in advance we may find out ok the distribution is normal like or it may be poison poison model is more appropriate to describe the number of arrivals you know during a time period the number of failures similarly in some situations we may arrive at a conclusion that binomial model is more useful or gamma distribution is more useful but the appropriate parameters of that distribution one may not know in advance so the distribution is completely specified if the parameter is known but in most of the practical situations it will not be known so the definition of the statistical inference are the problem of statistical inference is to determine on the basis of the given data that what would be the value of this unknown parameter so it is not necessary that we just tell when you sew the general problem of statistical inference is to make suitable statements are the assertions about the unknown parameter of the population here we can break this up into two broad areas one is that some feature of the population in which an experimenter or Inquirer is interested so let's say G theta this may be completely unknown and the experimenter would like to make a guess or estimate about this feature on the basis of a random sample from this population so this is called the problem of estimation so here he may come up with a single value for as an estimate so for example when I say average celery's and he may come up with a figure say 2200 rupees per month if we give a single value or a unique value for the unknown parameter of the population this is called point estimation because we are giving a single value that is a point on the other hand we may specify a range for example when we talk about the expected temperature in the coming year then we may say that the expected average temperature during the month of June is likely to be between 42 to 44 degree Celsius in a particular region of the country so here we are not telling a single value like saying average value is 43 degree Celsius rather we are giving a range now this range has to be qualified using certain probability statement this is called the problem of interval estimation and this is another part of statistical inference so in estimation we may specify a single value or may we may specify a range of values this is called the problem of estimation so this is one major area of statistical inference the second major area abroad categorize ation of statistical inference shell problems is that we may have some information regarding the unknown feature of the population which is available to the experimenter now the experimenter would like to check whether the information is appropriate additive it can be sustained in the light of the random sample which is drawn from the population so this is called the problem of testing of hypothesis so let us go back to the example of a new medicine getting developed so a biochemistry has derived a new substance and tested in a laboratory that it is quite effective against a disease-causing bacteria now the assumption is that the average effectiveness of the the medicine by which is prepared using this new substance will be more than the corresponding value or you can say so now when you are testing this effectiveness you have to identify in what terms you are measuring the effectiveness is it the proportion of the patients getting treated successfully or is it the length of the treatment or is it the survival rate etcetera suppose we fix here our measurement of effectiveness of a medicine by the proportion of the patients which get successfully treated so let us call it P so now that means suppose we give the medicine the medicine is given or the treatment is given to say 100 patients out of that how many get cured so we look at the proportion suppose this proportion is P for the new drug now there is an existing drug which had 60 percent cure rate that means 0.6 is the proportion which was curable using the previous drug so now in order to have or you can say in order to introduced this new medicine in the market we would like to check whether this P the proportion of the patience getting cured using the new medicine is greater than 0.6 or not this is called the problem of testing of hypothesis so this is the outcome of this test will be determined by the statistician using an appropriate statistical method so in this particular case it will be an appropriate test so based on a random sample of certain patients who are given the medicine one will need to check this thing there is another distinction which I would like to make at this point there are situations where due to not having enough data or due to volatile nature of the data it is not possible to model model the data according to a known distribution L form such as normal distribution or a gamma distribution or an exponential distribution because many times the data is huge and it may be having lot of variations therefore appropriate known probability models are not suitable to fit that distribution so such situations are considered by statisticians over the years and they have developed methods for estimation and testing etcetera these are called popularly as nonparametric methods or parameter free methods or the distribution free methods and this comes under the topic nonparametric inference in this particular course we will be spending almost all our time in discussing parametric inference so by parametric inference then we refer to the problem then the appropriate probability distribution has been specified and the problem is now reduced to making inferences about the parameter or a function of the parameter in the form of estimation which could be Oh point estimation or interval estimation or testing of hypothesis so these two fundamental aspects that is estimation and testing hypothesis they are used in almost every area of statistical methodology for example we consider predicting a future response I mentioned the problem of predicting the temperature for the forthcoming year we would like to predict the average food production in the next year we would like to predict the average industrial growth in the next year so these are the problems where the past data and certain other variables are used to predict the future thing so here this this type of inferential problem is treated under the topic of time series analysis similarly there are areas where we determine the relationship among the variables for example the effect of providing say irrigation say modern equipment good quality seed and say good quality of INSAS insecticides or pesticides etcetera to the farmers and we look at the response in terms of the increased food production or increased yield of that particular crop so here the response variable is y that is the yield and the variables which are determining this they are called regressor variables here x1 x2 etcetera that could be the amount of irrigation facility the amount of modern equipment modern fertilizers and other kind of things this is this topic is generally covered under the subject regression analysis designing of the experiments which is again used in the various industrial agricultural medical experiments ranking and ordering of populations etcetera so all of these advanced areas of statistical inference they use these fundamental aspects that is the estimation and the testing now at this stage I will introduce certain terminology and they are exact meanings in the context of statistical inference the first important terminology is the term population which I have been using till now from the beginning of this lecture so a population in a layman terminology refers to a collection of individuals could be human beings or it could be cattle or it could be insects so generally a population refers to living beings that means the entities themselves for example a population population of a country population of say sheep in a population of a sheep in a state population of say rax so we say that there are problems because the population of rats is increasing rapidly in a particular city or in a particular state however a statistical population is not the collection of individuals or the units it is the collection of the measurements or you can say aggregate of numerical or qualitative records of measurements on certain characteristics of interest so we looked at various problems just a while ago so we considered one problem of say estimating the average salary of the implies so here what would be the population the population is the the records against the salaries of the employees so suppose we are looking at an industrial organization so we may look at that all the employees which are employed in that particular industrial organization and so suppose there are 10,000 implies there and we have them marked according to their employee code or any other identification code then the values corresponding to their salaries so for example I am i identifying the the implies as one two three and so on up to ten thousand now the celery of the imply number one that is x1 the same celery corresponding to the imply number two that will be called x2 and so on so X 10,000 so in this particular case the population of interest is these ten thousand entries if we are looking at the weight at birth of the children in a certain geographical region then for all the children which are born during a particular period in a particular geographical region so we look at the value of the weights taken in say pounds or in kilograms or in grams corresponding to all the children born so here the population is that aggregate if we are looking at the incidence and incidences of deaths due to accidents on the road in a city on each day of the year then each day we record the number of accidents taking place and then the corresponding deaths in those accidents so the population here is the number of the deaths on each day responses to a new legal legislation to control the freedom of speech in a country so a new legislation is placed in the parliament or it is proposed by say by the Cabinet then say the opinion polls are taken whether it is a popular measure or not so the here the responses by the persons will be in the form of say they are whether they favor it or no not so it could be answers could be in the form of yes or no so the answers which are now here in the form of quality in a qualitative ones and that means it's in the form of attribute that is also consisting creating or you can say this collection is my population in this particular problem on the basis of this we may have to make the inference whether it is going to be a popular measure or not once we have identified a population of our impressed the next key the key term is parameter I have been using this term parameter repeatedly beforehand but however what is the proper meaning of the parameter so the specific characteristics of the population such as average for example it would be mean median mode arithmetic mean harmonic mean etcetera or a characteristic of this use for which determines the variability such as standard deviation range suppose it is determining the whether the population is symmetric or not maximum value minimum value etcetera so whatever the characteristics which in which the experimenter is interested in so the characteristic which are related to the population they are called the parameters so usually the parametric inference assumes a distribution FX theta so here theta is the parameter which characterizes the population so the popular examples like we say Poisson lambda distribution so lambda is the parameter here if I say normal Mu Sigma square distribution so the distributional model is normal and it is characterized by the parameters mu and Sigma square etcetera here mu and Sigma square are the mean and variance respectively in the Poisson distribution lambda itself is the mean as well as the variance of the distribution a statistic so this is the next terminology a statistic is a function of the sample observations so from the population the statistician has at his disposal a random sample on the basis of this which he will make the appropriate inferences so the sample is termed as observations X 1 X 2 X n so any function of these observations let us call it t TX where X is denoting the sample x1 x2 xn this is called a statistic so in a point estimation problem we usually identify a statistic let us call it say DX this is called an estimator of the parametric function G theta so for example in the suppose we have a Poisson model and we are having the rate lambda and we are interested to estimate say 1 by lambda so my parametric function is 1 by lambda so now it is a moot question whether we can find out an estimate for 1 by lambda or we may be interested to estimate say lambda to the power 3 we may be interested to estimate in a normal distribution say mu we may be interested to estimate Sigma square we may be interested to estimate Sigma and we may be interested to estimate mu plus say P Sigma which is denoting a quantile so depending upon the interest of the Enquirer or the experimenter one needs to determine which parameter is to be estimated or inference on which parameter is to be made and the corresponding statistic has to be frame from the sample which will be useful for the purpose so for example in the normal distribution one may use sample mean to estimate mu one may use sample variance say 1 by n Sigma X I minus X bar whole square to estimate Sigma square or one may use 1 by n Sigma XII minus X bar whole square to estimate Sigma square in a interval estimation problem in place of one statistic say in this point estimation we are proposing one that is DX but in interval estimation we need two that is endpoints of the interval where my parameter of interest is supposed to lie so we need to specify say D 1 X and D 2 X so that we can make a probability statement regarding the parametric function G theta lying in the interval D 1 to D 2 in testing of hypothesis we use a statistic let us call it say Phi X for a taking a decision to accept or reject a given hypothesis in this case Phi X is termed as the test function or test statistic so these are the basic terminologies which are to be used in statistical inference we have a population so that is the first thing that where we are interested what is our interest to study in the given setup so we identify the population we draw a random sample from the given population now drawing of a random sample itself is a matter of full investigation it comes under the topic of methods of sample surveys our methods our sampling techniques and it is another aspect of the statistical methodology where we discuss various methods of taking of random sample in this particular case we assume that a random sample is already available to us now our job is to use this random sample to draw appropriate inferences in the form of point estimation interval estimation or testing of hypothesis to inform the end-user about the appropriate conclusion of the for about the population parameters so parameters are the characteristic of the population in which we are interested in the decision is based on the random sample and for that purpose we use a function which is called a statistic so in the point estimation problem we will create a point estimator using the statistic in an interval estimator we will create an interval which is in the form of two statistics giving a range in a testing of hypothesis problem we will specify a test function or a test statistic using that random sample at this point let me briefly give example here so let us consider the problem of say average monthly salary of the employees in an organization now let us assume that the model for this is described by say Pareto distribution so if a rate or distribution may be having a is a continuous distribution the density function is of a given form say alpha beta to the power alpha divided by theta to the power say alpha plus 1 sorry X to the power alpha plus 1 where X is greater than beta so in this particular case we have considered a two parameter model where the parameters are alpha and beta both are of course positive now here we may be interested in the average monthly salary so average monthly salary denotes expectation of X that means from this distribution what is the value of the expectation of X which can be of course easily calculated so this value turns out to be alpha beta to the power alpha X to the power minus alpha plus 1 divided by minus alpha plus 1 from beta to infinity so alpha beta to the power alpha and then when we substitute the value at infinity this will vanish and at beta this will become so we will get beta to the power alpha minus 1 so the value turns out to be alpha by alpha minus 1 beta where of course alpha has to be greater than 1 otherwise this expression will not be valid so now in this particular problem we want to estimate this parametric function so this is my G theta here theta is a vector parameter consisting of two components alpha beta so now to estimate this now there may be different procedures as a layman one may say that take the random sample x1 x2 xn and we may use X bar that is the sample mean to estimate this so this could be one method and of course depending upon the situation one may develop the different methods as we will be seeing during the course of during this course on the other hand one may have to do some sort of testing here one we may like to check whether the average income levels are low or high so for low or high we may identify a control we may say that if the average monthly income is more than say five thousand rupees then we may say that they are well off or well paid and in that case we may devise a test statistic based on x1 x2 xn to take a decision whether this hypothesis is tenable or not that means we want to have a hypothesis whether the average monthly Selby's is more than $5,000 I will spend a few minutes on the historical developments of the subject so the historical development of the subject of statistical inference we can attribute towards the first half of the 18th century and mostly in the problems of astronomy and geodesy so in astronomy the interest was to find out the interplanetary difference distances the positions of the various planets are stars and their movements in geodesy one wanted to find out the spherical shape of the earth so it was known that chilly dark shape is spherical but it's flat on the near the poles so a standard technique is to take observations not one but several measurements are taken for example they are taken about the length of one degree of a certain meridian and the problem is to determine parameters alpha and beta which is specified the spheroid of the earth so in direct observations on alpha beta are given by the relation Y is equal to alpha plus beta X I so here x i's are given to us why is i given to us so alpha and beta are to be estimated so nowadays we understand this as a problem of linear simple linear regression however this problem was studied as early as in 18th century by Gauss and legendary who came up with the method of least squares to solve this problem even before gas and legendry about 50 years before that Boscovich in 1757 he proposed a solution for this problem very sought to minimize summation of modulus y:i minus alpha minus beta x i so in place of the square he initially considered the mean absolute error actually subject to the condition that some of the errors must be and he solved this problem using geometry geometrical methods based on five observations later on Laplace has given a general algebraic solution to this problem this can be considered as the first you can say attempt to solve an optimization problem under constraints later on gas and reagent be considered the minimization of the sum of the squares and that is why it came to be known as the method of least squares so you can consider the problem of statistical inference or even of the modern statistical inference is started as early years in the 18th century further developments or you can say the further techniques started to get developed towards latter half of the 19th century for example Francis Galton he started to study something called the relationship between the variables and he called it regression so he wanted to he sort of predicted that the tall parents have tall children but less tall than the parents and shorter parents have short children in the height but taller than the parents so this was called regression towards normality of the heights and the first studies are you can say first model of the simple linear regression we are made in this thing later on Karl Pearson developed the method of moments in the latter part of 19th century the modern methods of statistics as we know today and probably they were first started by Fisher in 1912 where he developed the method of maximum likelihood he is probably the first one when he realized the importance of comparing two different methods of estimation so he considered two estimates of the standard deviation he found out the sampling distributions of that and therefore the mean square and he showed that one of the estimators has a smaller mean squared error than the other so probably this is the fundamental or even say path breaking paper in 1922 that is called the mathematical foundations of theoretical statistics very listed the basic problem of theoretical statistics is firstly the problem of a specification that is defining the distribution of the population second is the problem of estimation and the third is the problems of distribution that means how to judge the goodness of the oregon's evaluate the performance of the estimators we need the sampling distributions of the sampling distributions of the statistic which are being used so these developments made by fisher in 1922 40 and we had these had the effects in the various areas of statistics such as estimation testing of hypothesis designs of experiments at the same time Jerzy Neyman and ESP arson simultaneously developed a theory of testing of hypothesis naman also developed the theory of sample surveys later on in 1940's Abraham Wald developed a topic called statistical decision theory and this includes various aspects of inference as special cases he showed in fact that estimation testing ranking and selection procedures they are all part of the general problem of decision theory which is actually having its origin in the theory of games which was developed in 1930s or 1920s by John von Neumann among others so friends today we have discussed the basic problem of statistical inference its main components in this particular case we will focus on the problem of estimation and testing of hypotheses so from the next class onwards I will start discussion on the problem of point estimation
Info
Channel: nptelhrd
Views: 84,305
Rating: undefined out of 5
Keywords: Introduction and Motivation
Id: iin6vthyzsQ
Channel Id: undefined
Length: 58min 18sec (3498 seconds)
Published: Tue Jun 11 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.