LEARNING MAHALANOBIS DISTANCE FROM C.R RAO's

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so let me begin by thanking parts de and the Academy for inviting me it's a great honor to speak on occasions like this when we are celebrating the greatest living statistician on earth is 100th birthday see I was not a student of CL although my title sort of gives you misleading information probably and mine I remember the first time I saw Ciara when I was a student in my second year we stat professor Lau Kim and he was giving a lecture in 3.0 that third floor that used to be the largest room available in our Institute at that time so we were sitting in the probably nearly the back rows and in the front rows we had all our start teachers sitting there like Professor J King who's Jacob Luthor I ought of mutual sitting there and it was quite an experience because when these teachers were teaching us we would just stare at them with great respect and fear I mean these people are now looking at CLA with a similar thing and Ciara was really teaching live as if there was they were this his students you know it's like he'll say a few things and then he says do you know how to solve this problem alright he lost Jacob blue thread do you have a program to compute this ok so that was my first exposure to CR ok then there are several times I have seen him as a student and but my first interaction with professor Ciara was quite surprising and very interesting and for any young statistician or for any young scientist that's a great honor I would say the way I interact that interaction took place when I was elected a fellow of this Academy I got a congratulatory note from him he was the first person to congratulate me and later on I found out that his habit whenever he sees in statistician our young mathematician getting elected to any of these fellows of the Academy he'll take the personal initiative to congratulate them to syenite annual so I was almost stunned to receive that email from dr. Sierra okay and it has happened when I was elected fellow of installs okay so he congratulated me okay so later on I have met him several times and we had many interactions but I still remember that day when I received that congratulatory mail from Sierra and I was elected a fellow of this Academy now the title of my talk is learning Mahalanobis distance from C R Rao two years ago we were celebrating 125 birth anniversary of Prasanth general manami's and few months back I gave a talk in this Academy in the media meeting on presentation from anonymous now we are celebrating hundred but the anniversary of Sierra and so I thought I will be talking about something which connects these two stalwarts in statistics in some way now let me begin with a question for their students see our route now the middle part Rao is Radha Krishna you know that writes I see you are nodding do you know why he was named Radha Krishna anybody has any guests so Sierra has an autobiographical essay that appeared in this famous book glimpses of India statistical heritage if you are interested in the history of development of statistics is something worth reading and most of these essays are written by the stalwarts could develop statistics are contributed to the development of statistics in this country and their CEO says something so what will be your guests while on business as the middle name there are people from the South here I guess okay that's not the right guess okay okay so he says according to him he was the eighth child like Krishna okay so that's four that's why Radha Krishna okay so that's also custom probably okay sir so please ask questions whenever you have difficulty understanding what I am saying okay because if I if you don't ask questions it becomes a monologue and it's very boring okay so you should feel free to interrupt me anytime you have difficulty understanding something there will be few technical things I may not be able to explain in full details but it doesn't matter ask your questions and I will try my best to explain things to you okay so let me begin by telling you what is Mahalanobis distance and how it's started so maulana base met Nelson and L in the 1920 not put session of the Indian Science Congress and I know that Ozma Helana beast analyzed anthropometric measurements of angle Indians in Calcutta and that led to the discovery or invention of this famous Mahalanobis distance okay okay so you know India was under British rule for a long time and people who are coming here from England to work here and to do business here and some of them civil here okay and there is a large community of even now and in large community of those English people living in several parts of mineral and other places okay they are called angular Indians okay now who was Nelson a model Mansell arendelle was a civil servant he was the director-general of the Geological Survey of India at that time and also the director of the Indian Museum in Calcutta okay so he was an anthropologist and he was interested in the anthropological data related to angle Indians okay clear okay so he was comparing different groups and that's how Maharana was invented this that led to the invention of Murano bees distance and you see his first paper actually appeared in the records of Indian Museum and actually this is the one that has all the details on all the statistical and mathematical details and that appeared in Journal of Asiatic Society and why it's in the Journal of Asiatic Society that's another story that's not something that I will talk about today okay now what is Mahalanobis distance I have to give you the expression okay so this is the model of this distance I presume all of you are familiar with basic matrix algebra okay so I think the technicalities are not going to go beyond this okay okay so suppose you have two groups of people like the chameleons and the mandalas okay and I want to compare these two groups like how different they are so I have anthropometric measurements on them like their heights various measurements usually this anthropometric cells may take a lot of measurements on their skulls okay various other measurements and then they get a set of variables like height and then various measurements on this called length of hand and all these things and then this mu one is like the mean for one group and the mu2 is the mean for the other group and Sigma is that dispersion okay and here we are assuming that they have a common dispersion so the distance between the two populations that is defined like this you see this is like a row vector and this is like a column vector so when I write a transpose that means it's a row and when I don't try to transpose that means it's a column vector and this is Sigma inverse so it's a it's a quadratic expression or quadratic form so at the end it's a real number and because Sigma you know you if you are familiar with dispersion matrix you know it's a positive definite matrix so this is going to be a positive number okay so that's the distance so if the means are very different then the distance will be large and the populations will be very different and when the means are close you say statistically very close that was the motivation okay the difficulty is this Sigma inverse K because Maharana bish wanted to look at difference in wins in the in relation to their spread or dispersion so Sigma inverse sort of standardizes the things before you calculate the distance and that is why that Sigma inverse appears there okay it's really the square of the distance the square root should be the distance but I am writing it like this so because wanna be we usually call it Mon innovations.this square so it is this Square D is the distance yeah so there p variate sam you are new to appear p dimensional okay let's let's say that that Sigma inverse is not there suppose Sigma inverse is the identity matrix then what will be this expression it is as the Euclidean distance between mu 1 and mu 2 okay so you understand why that is a distance now being a statistician Maharana Bush was not interested in looking at the state for our deep Lydian distance because Barnabas thought which is quite logical that distance between means should not taken just as such it should be standardized by the spread and that is why this Sigma inverse is coming up so the distance may be big or large but you had need to decide that based on the spread in the variable ok yes dispersion plays a role here so if that Sigma inverse is not there it's just the standard Euclidean distance that we have learned in your analytic geometry okay so it is that placement of Sigma inverse that makes it wannabes distance now that this square as he said it is the square of the distance it will also be free from the unit of measurement because of the standardization very good point right so usually there are I mean this you say this article's they are in some obscure places it's not easy to get access to this papers anymore so most of the time that paper that we refer to which is available in the net is this one which is it appeared in the procedures of the National Institute of Science which is the predecessor of Indian National Science Academy which was in Calcutta at that time so this is the 1996 Mahalanobis gave a special lecture on d square or Mahalanobis distance and the title of the article was on the generalized distance in statistics okay so okay all right okay okay then 1922 paper you mean okay so that is in the records of Indian museum that Asiatic Society paper okay okay okay yeah that's very good that case is very good it's not easy to access that okay yeah that's the paper that has all the statistical and mathematical details that Mawlana be--should worked out okay and the subsequent mathematical developments are due to SN Roy and RC Bo's and those people okay good okay so I will show you just one page of that paper that will be a good experience for you okay so see we write news and such thing this is just one page of that paper okay so you see Mahalanobis is writing something delta square and see in those days they didn't do Sigma so this s is summation so summation i from 1 to P okay now let's say ignore these things you come here okay you see here so he writes alpha for the dispersion matrix okay he doesn't write Sigma and then you see he writes alpha IJ which is an entry related to the universe or from the inverse of the matrix and then he writes this expression cofactor divided by determinant and you understand why right why this cofactor and determinants are coming up look at this expression there is a sigma inverse and the first formula or first expression that we have learned for inverse what is that say I had joined that so the cofactors will term so that is why this cofactor and such things are coming up now imagine and answer usually these anthropometry she ins have something like hundred measurements on an individual so it will be a hundred by hundred matrix so how you calculate the adjoint of a hundred by hundred minutes how many cofactors you have to calculate it will be almost an impossible task right and this poor CRO when he was employed as a apprentice in Indian Statistical Institute in early 40s he was given a primitive computing machine and he was asked to analyze data not to the anglo-indian population that was earlier so it's from the United province it was from U P at that time they occurred there are some anthropologist who are working on anthropometric measurements coming from the various tribes living in United province at that time now it is the Pradesh okay so he yes this essentially an adding machine they had yeah so so this poor CR Rao had to do all these computations for his professor so I think somebody has already said this in the Institute there is one professor professor Mahalanobis and there is dr. Rao okay now dr. law was not doctor at that time it was in early 40s okay doctor or become doctor in 1948 so this poor Sierra was doing all these computations so he realized that he is not going to survive if he has to compute all these cofactors and such things okay so lot of people don't know this that led to a beautiful result that he invented while working on this problem and that is given actually in his famous book linear statistical inference it doesn't give it as a result invented by him but it is the only textbook that gives the details of such a result so what you do they are I will not tell you the details of that result okay I will tell you the main idea so what you do is that you use a very interesting formula for the determinant of a partition matrix when you partition a matrix there is a nice formula connecting the determinant of the whole matrix with the determinants of different partitions and he used that very cleverly to come up with a very intelligent way of computing Mahalanobis distance essentially what you do is that suppose you have P variables that gives you a P by P dispersion matrix and you add an extra column and an extra row there which is just the difference of the two means and then put a zero there to make it a P plus 1 by P plus 1 matrix and then you do a sweep out operation on this matrix to make it an upper triangular or a lower triangular matrix whatever you like and then the last entry that you are left there the left more the rightmost lower entry with a negative sign that will be Mahalanobis distance so forget about computation of all these cofactors adjoins it comes so nicely so that was one of his first invention actually to compute Mahalanobis distance in a way that is feasible with this primitive machines so a lot of people don't know this we were told this by some of our teachers I am telling you this story sometimes it is the necessary you know necessary is the mother of invention okay so this so he invented this so this is one of his very early contributions towards development of the computational methodology for disc one so yeah we'll probably don't teach them but Bob Mahalanobis distance if you compute that using any statistical software it is actually computed using that algorithm you cannot compute it using adjoint matrix then will be dead yeah no it is fine isolated from the data si it is estimate so all these parameters are estimated from the data so you take the mean of the data you take the dispersion of the data and that's what you use what is the sample variance covariance matrix right now what is subsequent connection of ceará with that now cro when he joined the Indian Statistical Institute shortly after that in 1941 in Calcutta University they started the master's program in statistics and CLO joined that along with five of the students horican columbia was his classmate at that time and so these people were learning statistics from more senior people who were doing part-time teaching in Calcutta University and brento Chandra Maharana which was the honorary head of the Department in Calcutta University and I will tell you so if you are afraid of asking question this is something you should learn from ceará this is from Sara's own writing I will read two three sentences from his writing okay here so this is what Sara writes about his experience as a master student in the first statistics program in India in Calcutta University none of the teachers had any experience in teaching statistics and for that they were as ignorant as the students in some areas of Statistics as there was no textbooks and statistics that teachers had to learn by reading original papers and then teach the courses given in the first two years benefited the faculty as well as the students and such a program led to development or I mean it produced people like Horrigan Columbia and CRO okay and these people all learned by asking questions among themselves so learnt ask questions don't keep silent then you don't learn okay so this is beginning of his training in statistics and then after 1943 when he graduated he joined the Indian Statistical Institute as a technical apprentice and then I think so he was working on this data that I have already mentioned and then something interesting happened I will come to that so this is the so when he was teaching when he was working as an apprentice he was also doing some teaching and he have heard from some of the earlier speakers that while teaching he invented two of those great results now Blackwell theorem and Rama Rao lower bound CRO has lot of interesting anecdotes to tell if you particularly read this essay will learn that one of the anecdotes that he's right here and this is something he had told me also a when once I was talking to him when he was visiting India in Hyderabad so this Rama Rao inequality okay so his name was Sierra so once he was traveling in Iran and his luggage got lost and then he complained about it and then he went to the hotel after some time he gets a call from the airline and the lady from the airline tells professor Rama Rao then he says no I am NOT come around I am Sierra but no no no there is a person here who is looking for you he says you are come along so this is one of the anecdotes okay sir Sierra now became Rama Rao in Iran so anyway so he invented there tomorrow lower bound and now Blackwell theorem when he was teaching and he didn't have a PhD degree at that time so now coming to his PhD okay so how did he end up doing his PhD work with Fisher so this is the story I will read again from his own writing in 1946 the professor received an unusual cable from JC Trevor a Cambridge University anthropologist the cable says finally de puta scholar from the Indian Statistical Institute to the Cambridge University to apply Mahalanobis distance analysis on some anthropometric data collected by the University Museum if you are watching the movie that was shown say Allah mentioned that there okay and the two people whom Marana be sent for that one of them was see around and other person probably many of you don't know the name he is a very famous anthropologist his name is Artyom so this as our comedy was also on the filming yeah so these two people went there and they worked on it and before that fish are visited India a few times so Sierra had a chance to meet Ronald Fisher when he was in India so when he was in Cambridge this is what he writes so shortly after reaching Cambridge I met RA Fisher who was the Belfer professor of genetics at that time and whom I had met in India during his visit in 1944 I told him that I had enrolled myself as a research scholar and asked him whether he would agree to be my supervisor so that was the beginning of his interaction with our official as his PhD supervisor and then he writes he agreed but suggested that it would be a good experience for me to work also in his laboratory where he was breeding mice for linkage studies okay so Fisher was a geneticist Mawlana SH was a physicist Fisher was a geneticist and poor ceahlau is getting sandwiched between these two people and becoming statistician okay so now so see Allah was originally deported to Cambridge - what on this anthropological data in the University Museum and then he enrolled as a PhD student and Fisher asked him to do this linkage studies with mice so at one place he writes this I kept myself busy in Cambridge dividing my time between bones and stones at the University anthropological Museum and my said whittingham lodged the official residence of the bell foot professor of genetics so he was busy with mice bones and stones in Cambridge for two years that Saudi rights so now let's come back to Fisher so here is Fisher's paper and remember it is the same date 1936 it appeared in the annals of eugenics the use of multiple measurements and taxonomic problems and Fisher considered this problem of classification of different Irish species he had measurements on the lengths and the widths of the sepals and the petals of the flowers of iris setosa iris virginica iris versicolor so his problem was like this so he had measurements on these different ideas varieties and he wanted to use that those measurements to distinguish or discriminate among these different ideas flowers so he converted the taxonomy problem into a statistical problem so it's like the you are you will be given some measurements it's like I will be giving you the measurements on some individual and you have to tell her that this person is from Punjab or from Tamil Nadu something like that okay so he that was his problem now let's look at Fisher's discriminant function and Mahalanobis distance I have already told you what is maulana bees distance now here i am defining Mahalanobis distance of a particular data from the population with mean mu and dispersion Sigma C it's a distance or rather distance square so just like I can define distance between two populations suppose I have a population and I give you a data or an individual you can try to calculate what is the distance of that individual from that population and that is calculated like this so it is this thing and Fisher formed the linear discriminant function which had this formula how we found that I will be telling you in a few minutes but this is Fisher's linear discriminant for you see it's a linear expression right it's linear in X so you have the two two means for the two population and Fisher assume that they have the common dispersion and this is the linear discriminant function so Fisher said you calculate the value of this linear discriminant function and if this value is greater than zero then it will be coming from one population and if it is smaller than zero it will be coming from another population that was Fisher's idea now how are these two things linked and why Fisher thought so let's try to understand that step by step so here is what happens so Fisher's linear discriminant function corresponds to the separating hyperplane that separates the points which are closer in Marana bees distance to one population from the points which are closer to the other population let me draw a picture okay so suppose this is one set of data and this is another set of data okay so here is a point which clearly belongs to this population whereas here is another point which clearly belongs to this population and here is fishs linear discriminant function the separating line so if this point is on this side Fisher will classify it as the green population if it is on this side Fisher will classify it as in the blue population if you calculate the muharram is distance of this point from the blue population you can see from the picture that it will be closer to the blue Proclamation and similarly for this one it will be closer to the green population but that was not the original motivation of Fisher to come up with linear discriminant function that's what we know today so that is there is an intrinsic connection between Mahalanobis distance and fishers linear discriminant function that's this the linear discriminant function is really the separating hyperplane that separates the two points when where one set of points are closer to one population and other set of points are closer to the other population where closeness is measured using Mahalanobis distance but this is not something fish are realized okay now you are familiar with basic statistics right so you have learned about Fisher's two sample t-test and such things okay so why do you think fish are decided for that particular linear discriminant function I gave you a hint so you know about that two sample t-test suppose you don't have many variables they have only one variable okay so in that case you can calculate the t-statistic between there between the two populations and you can try to see whether the two populations are very well separated or not right so that you can decide based on your T statistic if your T statistic is very large in magnitude then you decide that the two groups are very different significantly different and if your T statistics is not so large then you do that and then you decide that they are not so different okay now my question is so Fisher came up with their linear function now if I take a linear function of the data the entire data then becomes just univariate data isn't it suppose I have all your measurements and I take a linear function that linear function converts all these multivariate data points into just a real value or univariate data so it's like I take a linear function of your height weight etcetera okay I finally I get just one number for each individual so after that suppose you try to do a t-test how will you choose your linear function anybody has an idea so you can choose many linear functions and different linear functions will give you different values for your T statistic which linear function is good for you when you are trying to form a linear function that distinguishes two populations is my question clear to everybody so I can form many linear functions my question is which linear function I should take so that I can distinguish the two populations using my T statistic that fish I invented Fisher steel so so larger the value of the T statistic the better it is for you right because if you have a large value of T statistic then that gives you more separation so Fisher was trying to get a linear function that means that T statistic very large and so this was Fisher's motivation so if you read Fisher's paper that's there in the very first paragraph of his paper so he says these so his choice of his linear discriminant function was motivated by the fact that this linear function maximizes Fisher's to sample T statistic computed from linear functions of the data and Fisher claimed that that is the best linear function the best is according to Fisher but when you have a classification problem what is waste it's like it should give you a classifier so that your Mis classification is small but Fisher never tried to say anything about that in the paper we shall only said take that linear function which is motivated by this and that is the best linear classifier and if you use that we will get good result I left it there but a more important question now is that if I use that classifier how many cases I will wrongly classify and how many cases I will correctly classify well it's not going to be perfect because is you see that picture there are this blue points here and there are these green points on this site so there will be miss classification so now comes this fundamental question is this classifier really as Fisher thought the best classifier the answer was not known till sea route to coffee speech the research is efficient so this was C allows please the research problem so when he started working with Fisher actually he was the only student who was doing multivariate analysis all other students of Fisher were working on genetics and later on Sierra said that among my students also and mostly had statisticians and problems but there was he mentions about two geneticists who worked under him they are Rana Chakraborty and DC now these two people were the only two geneticists to work under CL now supervision okay all other serve either statisticians are promised okay so this is now C aroused 1948 paper so this is CL Ross PhD thesis so if you have not read it please read it okay this is C arouse PhD thesis and you see there is a remarkable similarity between the two titles so he realized the utilization of multiple measurements in problems of biological classification and there the title of Fisher's paper was the use of multiple measurements in taxonomic problems you see the connection so when Cyrano worked and I said that he was working in the anthropological Museum there so now considered the problem of classifying human skull recovered in archaeological excavation into Iron Age and Bronze Age so there are these calls coming from different ages and he was on that to summarize that famous paper of is actually a discussion paper in G RSS there are many famous statisticians who are discussing it in fact there some question so though the interesting thing is that in that I mean that's the only place where I see CLO getting addressed as mr. Rao so throughout this discussion everybody is addressing here as mr. Rao mr. Rao mr. Rao like that okay so at one place mr. Rao was asked a very interesting question by some people that well sir so CM approved the following optimality let me show you that optimized result and then I will tell you the question okay so here is the arouse result the linear discriminant function has raised risk optimality for Gaussian class distributions who is differ in their locations but have the same discussion say essentially the result is that Fisher thought that that's the best classifier but Fisher did not prove that that has it that is the best Wis classification rate CR now prove that so this Bayes optimality means that it has the best with classification rates if that populations have Gaussian distributions okay so so it finally settled why fish as linear discriminant function is the best under certain conditions okay so in fact though it is not articulated in Sierra's paper it's actually not difficult to see from his work that this basic optimality holds for elliptical is symmetric and unimodal class distributions which differ only in their locations I mean they don't have to be Gaussian and so one so the Quai coming back to the question so a lot of people who are asking him this question but you have computed the linear discriminant function based on the data that you have whereas this optimality results are related to the distribution the population so how optimal is it is when you are computing it from the data and of course see Allah didn't know the answer at that time okay and it's a difficult question to answer okay it's not an easy question to answer so some of these questions were asked during his I mean that was that discussion paper I in my opinion is really his PhD defense okay and he was fired questions from all these great statisticians and he was trying his best to respond to that yeah discussion part is much much longer than the paper okay and then there is a design that that's also quite long actually and then a few dots like when he first asked Fisher for a problem this is what Fisher apparently told him so this is what the fish are told him when he was asking for a problem for his PhD work problems must be yours I shall help if I can so if you become a PhD student don't complain about your supervisor this is how you become Ciara okay so that's about his PhD research and so he returned to India in 1948 he was 28 years old at that time now if I go back to my first slide you see this event also happened in 1920 so this is the year when Mawlana bees madness on an island and that was the genesis of this square and Ciara was born in the same year so 1920 is really a they mean this is probably the landmark er in the history of Statistics so Sierra was appointed a professor when he returned to India in 1948 apparently according to Sierra when he returned by that time in our Institute we had this classification of lecturer associate professor professor and profile professor Mahalanobis was thinking about giving the professor positions to SN Roy and RC vos and the associate professor position to see Allah but SN Ryan RC was left shortly around that time and so Sierra was with the professor and at various points of time so I want to read some of his personal comments about from Chandra mawlana bees well that's quite a learning experience okay so his first impression about professor Mawlana be showing he joined as an apprentice okay this is what he says so I were different scholars and tried to interact with them I could not meet the professor as professor Mahalanobis was referred to with all and reverence by everyone I had seen him coming in and going out of his office at all times of the day he had a tall and commanding figure everybody was alert and ready to be called in for a discussion when the professor was in the scholars this is the most interesting thing the scholars maintained a diary of their minute-to-minute at activities one of the frequent entries was discussion with the professor that was one of the discussion in the professor Hall most frequent entry yeah nobody in the Institute tell you what the discussions were about so this is one thing and then the second thing that he has said about professor so when he graduated from Calcutta University he he was given that job and so so he passed away in statistics with the first class first rank and so this is what he says I was among the first five to receive the MA degree in statistics from any Indian University the professor offered jobs to all of us in the ISI as technical apprentices on a salary of rupees 75 a month with first-class master's degree in mathematics at the Andhra University and statistics at the Calcutta University I was expecting a higher salary however I accepted the job without quote-unquote asking for more and join the ISI in December 1943 to my surprise I found that my salary was increased to 150 within a month perhaps the professor meant it as a gift for accepting the earlier offer without asking for more so he then he goes on saying that this is a very good strategy with professor you don't ask for more than you get more so they need to go along then simultaneously there was an offer of a part-time lecturer of rupees hundreds all together he was making something like 250 rupees and that made him decide that he will stay in Calcutta he will go back to answer so for 250 rupees Indian Statistical Institute got CR Rao for good so this is the history and after coming back to India he went on working on Mahalanobis distance that was one of his great passion to work on various aspects of the thing and then something that many of us do not know and this is a we were talking in the breakfast about this paper this paper of CR now is varadarajan many people don't know as a result many people reinvent the results in that paper in various ways and so that is the last thing that I want to mention here is that raw origin paper so that appeared in Sanka so they were considering this anthropometric measurements and as I said the number of variables were very large okay so at times they run into this difficulty that this Sigma matrix for which you have to compute the inverse it was not invertible because you are computing it from the data okay so they were at a loss at that time so how today we know a lot of this regularization techniques and many other techniques to deal with them lack of invertibility in the dispersion but those are not known to CRO and disco workers at that time okay but they were concentrating on some theoretical problems I with like varadarajan and they considered this problem of discrimination problem in of Gaussian measures in infinite dimensional spaces okay see one way to deal with this very high dimensional data is to think about high dimensional data as something like an object which is something like a set of points which are embedded in an infinite dimensional space okay so you pretend that it's really coming from an infinite dimensional space but you are looking at something like a finite dimensional projection of it like when you look at the ECG curve of an individual you take the measurements at definite time points but suppose you are able to measure it continuously over time then it will be an infinite dimensional object okay but you actually measure it at only a few time points so it's really a finite-dimensional thing but it's a very high-dimensional thing okay so when were dodging and CR Rao who are working on this problem so that is the slide that I had they are a few minutes ago so these are two very interesting discoveries they made so two Gaussian probabilities with positive definite covariance in finite dimensional spaces are always mutually absolutely continuous meaning that it's like one distribution is absolutely continuous with respect to the other and vice versa so if you want you can always from the likelihood ratio and that will give you the best risk optimal classifier hey unfortunately or fortunately this does not happen to be true in infinite dimensional spaces in infinite dimensional spaces you can have two Gaussian probabilities with positive definite covariance operators but they may be of thora it's a beautiful result and they also give when they are mutually absolutely continuous the sufficient condition for that okay so two Gaussian probability so with common positive definite covariance operator a mutually absolutely continuous if and only if the difference between there means lies in the range space of the square root of the covariance operator remember what is the Mohandas distance it is sick so let me go back now think of all these things as infinite dimensional objects so Sigma is a covariance operator now now if it is the covariance operator in an infinite dimensional space just being positive definite does not mean it is invertible because it may not be on - that's the basic difference between an infinite dimensional vector space and the finite dimensional vector space 1 1 does not imply on - you have learned that in your linear algebra right infinite dimensional space is not true so if it is not 1 - if I just give you mu1 minus mu2 it may not be in the range space of Sigma inverse so I may not be able to compute an obvious distance and when you are not able to compute Mahalanobis distance what are the Rajan law paper is telling that then the two probability measures are going to be orthogonal so they are well separated you don't have to worry about classifying them they are sitting on two disjoint sets so it's a beautiful result and this is something that he did in 1963 yes 1963 so this is the first and most fundamental work on classification in infinite dimensional spaces this sort of is like the foundational result for classification infinite dimensional spaces begin those of us who have spent enormous amount of time in our research carrier with element of is the students developing a classification for high dimensional or infinite dimensional data this is sort of like the foundation yes this is the culmination point that in infinite dimension things are very different there may be absolute either mutual absolute continuity when you have the same base classifier the Fisher's linear discriminant function and if it's not then the problem is trivial now is this true for non Gaussian distribution is it true only in Hilbert space or tis true in more complex places like Banach spaces : who worked on this problem when you have probability measures in Banach spaces and he gave more interesting and more technically involved characterization of these things so I think I will leave it here if people are interested in doing research in this area this row of another agent paper is a must reading for everybody so you should read this paper before you take up your research in statistical inference and infinite dimensional spaces thank you [Applause]
Info
Channel: Indian Academy of Sciences
Views: 2,423
Rating: 4.9101124 out of 5
Keywords:
Id: 7DeUcLr_HFI
Channel Id: undefined
Length: 47min 13sec (2833 seconds)
Published: Thu Dec 26 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.