Multivariate models (QRM Chapter 6)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so now we go to multivariate models this is again offer I think a real core aspect of the book and one of the res on that curve for us to write a book because we found that this field is really lacking a summary at the level of quantitative risk management that many of the early development especially on a statistical estimation I still remember a lot of the generalized inverse Gaussian and so Alex and radical really worked very hard from that so it's but now it has become standard standard knowledge so the basics of multivariate modeling at this level of the course we've seen all of that so I go very quickly there's nothing on the first three four slides it's just putting down the the a random vector because we're now interested time is not there for the moment will be awhile before we meet time again we'll meet time again when rüdiger starts with credit risk I think multivariate time series yeah sorry for that so flu moment is not there sorry okay so first of all the joint distribution is a distribution of X 1 and X D less than their respective given values from the joint you get the marginals marginalization so from a joint model you get marginal distributions nothing special you can look at the marginal distributions of sub-blocks by just looking at the relevant part where you want to find the marginal distribution if F is absolutely continuous in particular if every sensor density you can have a multivariate density many models are written think of the multivariate Gaussian at the level of the density so you can work with densities from the densities you get the marginal distributions this you all know of course you know from a joint model you get the densities from a density of the marginal densities you don't get the Jordan model understand at the example in any trivial courses you do discrete data you look at a bivariate table of probabilities you get the marginals from the modules you can never fill out the full the full joint distribution in the interior of the of discrete data to variate for instance it's trivial we all know that so but I still say the joint model that's the holy grail of risk management if you have that you can answer many many many questions from the joint model you get the marginal you cannot go back there are infinitely many short modes with the same margins and that's where the copy letter starts to play a role later ok so this is all well known very rarely but I think in credit and bidding copulas we may work with what's called a joint survival function now carefully I put the bar until now the bar means 1 minus F if I use the bar this is a bolt X a big X but if I use the bar for a joint Viktor you don't you see there's a big X then we use a dysfunction so this is not 1 minus f of X but it's just the joint survival function I think a medical statistics the probability that tension one lives longer than one year up to patient D lives longer than one year the row one or if you think of default probabilities the probability that the time to default is larger than X 1 to X D for each of the DD components so that that are just saying that this function is typically very important in risk management we might occasionally use it ok well I just mentioned that it's not the same if X is d dimensional more than 2 then it's cause it's different independence we come back to that much more later two vectors are independent if the Joint Distribution of the two vectors is a product of the marginals you know that the product rule if you have a density then you can write down the product rule again these are vectors if you like indeed I mention this can be K dimensional D minus K if the joint density factors in the marginal densities always the product rule at level distribution of the level of events and the level of densities whatever you've got there to your proposal it's a product rule ok these components themselves of my victor are independent we will typically say independent sometimes people want to stress mutually independent rather than pairwise independent mean I have when we say independent it really is global independent mutual independence if the joint distribution is the product of the marginals okay and again you can write it down or the level of densities if you have it so this is all standard we know so far again fast what we have used it if I've got a victor the expectation of a victor is the vector of the expectation so I do expectations every component so nothing special here the covariance we already seen it is the the the mix second moment now I start being a bit careful about the transpose our vectors of standard row vectors in the transparent first make them column column vectors and I said it correctly I signed it correctly said it wrongly column vectors and Transplant transpose make them grow vectors so always make sure that it fits here there's the covariance the IJ element of the covariance matrix is the variance is the covariance between X I index J if I equals J is the variance again now I'm getting already a bit more careful you can so of course it's true well of course it's true it's one of the big sins out there that many people say well it's true that if the random variables are independent then the covariance is 0 this you see here because if the variables are independent then the expectation of the product is the product of the expectation that's immediate from the dependency definition and this is 0 and if now the of course comes in the conversion of course I know that I've learned that in my first class on probability of statistics that the converse is wrong we all know that and if you don't if you're not sure about that then you better keep your eyes open for a couple of days more especially copulas unfortunately people often try to convince me explicitly and implicitly that I have a perhaps it's almost true no it's not almost true often I see this table well the covariances smoke diversify it's close to independent it may be it may also not be and believe me in crises it's typically it's not so careful about this one and this will learn that's why the same color in copulas right well that's one of the main reasons for introducing the copulas to understand what happens here now correlations once you have the covariance matrix and the die by the standard deviations that makes them live between plus and minus one that's a famous Cauchy Schwarz inequality Schwarz without T we know that the extremities are arrived at if your data is linearly dependent so if you plus minus one even only have linear dependence that's why it's called linear correlation this will come back later keep it in mind okay that's a surprise there later and now you have various properties we already use these property so I will not lose time on that that just a standard probability calculus with variances means covariances you should know there's a bit of linear algebra and we will not do too much of that but if you want to dig a bit deeper you have to know that expectation is also linear at the level of matrices and at the level of expect of vectors and the matrix a always has to be such that ax makes sense a scape times d x is d times 1 so this makes sense standard calculus so it's expectations linear this is perhaps something more important that you should really one of the formulas you should hang above your bed that's one of the useful formulas it's easy but it's a useful one the covariance of a linear linear combination of vector and matrix plus a vector in the dimensions are so that it makes sense so it's K dimensional vector here the covariance of AX plus B is a covariance X a prime and again you can work without a component wise it's immediate there's nothing special about this but this is one you should know by heart so a sigma a accent a Sigma a transpose if I call Sigma standard notation the covariance matrix the covariance matrix of my vector X assumed to exist take K equals one then make sure that the dimensions are correct so it's a 1 times K times a K times 1 or 1 times D times D times 1 2 it's a number so then you see that the covariance let me see the covariance of a prime H soon I'll take a very special one-dimensional a so it's a victor well one dimension is a vector now sorry then the covariance of this is a prime Sigma a that's the same formula careful the accent is now on the right hand side and this is the variance this is a little thing you all know again important but this shows you already that covariance matrices are by definition positive semi-definite because they satisfy this property ok so that's here so it's all standard if this is strictly positive it's called strictly positive definite again it will not play a big role numerically these things are important but we will most of the time work with definite covariance matrices and that only means that my data I can invert a covariance matrix otherwise a bit careful with what things like colinearity and so I will not do not do that the whole s key decomposition that's a power result the cholesky decomposition is a very powerful numerical result you may have seen in your high school days for solving Gauss equations of a solving numerical equations with matrices it's like it's a decomposition it's like somehow like taking the root you have a semi definite matrix Sigma think of a covariance matrix I mean you make it positive definite I don't care then you can always write this covariance matrix as a product of an upper and lower diagonal matrix an upper diagonal matrix is only positive elements in this case positive above the diagonal 0 here in the transform transpose vice versa and of course we've got if you've got a diagonal mate in a linear system that's what you did as a school kid even you can actually solve your equations so who s key is important so we will use that it's called the cholesky decomposition is called the horn s key factor okay no a triangle upper triangular one thing if you really want to do the mathematics but I will not do it you should know about characteristic functions not for this course but if you understand a little bit more of the main results and the proves and all that in the book you should know about Fourier transforms it's a Fourier transform you know that the distribution function of a vector or a random variable is uniquely determined by its Fourier transform this is the Fourier transform now it is in D dimensions so it's T 1 X 1 plus T 2 X 2 plus theta dot t d XD that's a Fourier transform here's the famous I I square equals minus 1 this always exists always exists because the norm of this thing here is bounded by 1 so it always existed Fourier transform that's nice about it and it as a tool fully analysis is important you will see two or three times appearing it will not be essential so you may close your eyes if you see Fourier well at least fullier transform i think if i would open my eyes widely if I see Fourier but nevermind proposition 6 1 a symmetric matrix Sigma is a covariance matrix even only if it's positive semi definite and in the one direction to the right that's exactly this statement here so covariance matrix and positive semi definite matrices in linear algebra are the same that's why we can use so much of linear algebra and this is unfortunate for some of you perhaps is if you really want to understand multivariate models you must brush up your linear algebra there's nothing you can avoid there it's linear algebra you may even go to some of these more now not fancy Theo that import like Frobenius Theory etc it's so terribly important if you go to factor analysis if you go to high dimensional data represent even geometry plays are all algebra tiser oh by the way just an aside and I can tell it here because one of the founders of Google was from I think it B FL next door all of them I think bring up now one of the two was from I think from Lausanne the Google search machine the search machine is websites one website you know sites and links between websites so you huge matrix and you can say are normalized by the number of visits from site a to site J most of these are zero what I did Google and I still remember it I'm old enough that the days of Alta Vista all the others sold the other search machines and certainly Google came on the map now everybody's able to Google even if I looked at telephone number of my colleague at aha I look at Google it's ridiculous I should have a telephone book on my desk it uses Frobenius theory which is a theory of positive matrices it uses linear algebra extremely poor if you never read it it's a bit of a strange statement now Google googling for values let's like self reference it's a bit dangerous just like the dynamical systems can be chaotic alright but I just want to pause and say linear algebra you cannot avoid it geometry you cannot afford in a multivariate world we'll see a little bit of that especially when Marius takes over later now you can estimate the means in the covariance here are the estimators you take the sample mean in the sample covariance in the sample outer coral at the sample correlation you can write them down a little side comment I think we mentioned the book it's not so trivial as it may sound I mean these estimators are not always perhaps the best estimate if things like shrinkage and so I will not not go into detail but there's a little provider by using when using these but for up for us if we estimate the victors of means the covariance matrix we use a sample estimation the sample estimate is here but there's a little comment that could make there we can make it unbiased you know the famous N 1 is 1 1 degree of freedom if you like it true subtract but that's the standard for small and it's important so this is all you know nothing special here now I slowed down now corresponds perhaps one of the most important slides of the course I'm very biased of course this is the multivariate Gaussian and really one should give a whole day of course on the multivariate Gaussian in the old slides we had more slides of this this is one definition now if you now I give you five minutes it's rhetorical you won't do it write down your preferred definition is gone I hope it's gone yeah you prefer definition of the multivariate Gaussian and I will collect all your papers and will be nice to see what definition you wrote down I can think of four or five definitions this is one of them this is a constructive one this is a simulation one because you see I start with as it vector you all know the zig vector zit want to ZD iid gaussians iid standard normal Marius told us yesterday with current computer power just in version we because we can simulate from them so the building book to start with is d independent standard gaussians we all know that where do I get the covariance in whole this key is around the corner I give you a matrix a you can think both of a time D equals K don't worry about that we write it down but you can take all the equals K just a bit more general in that case a matrix a any matrix a well essentially you can calculate the mean this is just a vector so the me is linear so it's mute now the covariance remember look above your bit there's this formula there the covariance well if you shift the covariance doesn't change but here is a covariance set a prime remember that was the formula above your bid this is a a prime I define it to be Sigma you see these little dots here because there was no Sigma to start with I define it to be Sigma that's always a positive semi definite matrix it's like the square a prime is like taking a square it is like taking is clear later going from Sigma to the a if I give you a sigma and you want to find the a cholesky see that's and now you see I have nothing done yet I've just said I start with iid gaussians I pre multiply by a matrix I get a new vector I had the mean and I look I get an interesting distribution I have no idea what could call it multivariate Gaussian I only showed you that the mean is mu so this vector I can call the mean and I already showed you that I can call the diet the covariance is a a prime and I just denoted by Sigma nothing more this is fakirs doesn't have any meaning just notation I can do a bit more but I camellia merely skip it I can calculate for this vector and that's an easy calculation if you know a bit of characteristic functions not much really I can write down the characteristic function now this already starts to look a bit that we know it's a big looking like e to the minus 1/2 T square in one dimension but that's ok it's there forget about it exercise transformation theorem in analysis show that if you do this construction this is a random vector that that random vector has a density and here it is you know I look at your page and that's exactly the definition we wrote down there all right so that's a second definition this is the Gaussian density so now I'm saying oh wow this way now this would be called a Brown should a multivariate d dimensional Gaussian distribution with mean vector mu location and variance covariance matrix Sigma where Sigma is a a prime in this definition so if you start with this definition which is a trivial one to simulate from because I can simulate this Frank give me his Sigma I do cholesky decomposition I find a and a prime i pre-multiply it Marv uses D iid gaussians inverting I have this vector I know this from cholesky and I'm done you just add the mean you see the difference and if you if I were to ask you how would you simulate from your model there well you could release you you want to talk to margies perhaps that's the difference in a constructive definition in an analytic definition and this will come back later this will come back later in a very nice little story related to a big investment bank in New York later now I know the density I can plot it so Marius will take the our code and you have these plots of these distributions in two dimensions I think you can put all these things ok how goes the geometry now we've gone back to Euclid how does this density look like well it's not a complicated but the dependence on X is only here now let's do in two dimensions I draw the density and the density may be anything it won't be but it could be anything integrating to 1 and now I cut it I look at the level set of the density so I cut that complicated landscape like across the leg here that's France most of the time I cut it and I look from above how do these level curves look like so that means you put it equal to a constant you put this equal to a course because these are all constant so is equal to a constant that's what level sit and now you'll go back to your school days in Moscow definitely your pre university days in Europe less unless I'm afraid that's an ellipsoid that's this equal to 5 for instance mu equal to 0 these are ellipses in circles so your new curtain depends on the Sigma of Sigma's identities and it's a ball or or or ellipse or ellipse so this is an elliptical distribution we call it because the density is if I look from a dot up above it's like a Mexican hat that you cut pieces off it's an ellipsoid interesting all right that's here also something you see but now you have to think a bit beyond one of the fundamental theorems and fundamental mistakes in finance when are the components of the multivariate Gaussian independent now remember if I have the density a multivariate the data the multivariate vector with the density is independent if and only if I can factorize the density in marginal densities when can you do that there's only one way you can do that if this Sigma is identical the diagonal as soon as you start mixing of diagonal you start multiplying X once we take seven and serve no way you can factorize so by just looking at this thing you've got one of the fundamental theorems of statistics for the multivariate Gaussian the victors the components are mutually are independent let me call it mutually independent if and only if Sigma is diagonal if and only if all the correlations are 0 now you see you can go back remember now I can go back if five in this case of bivariate Gaussian with bivariate with zero correlation then they're independent so just calculating this number the correlation I can find independence alright my students I exaggerated a little bit but if they make a mistake in coding this result they really have a problem in getting a pass on my examination don't try to tell to me when can you go back and you say oh yeah yeah I can go I'm not pointing to anybody I'm putting to him here when can you go back oh yes you know in the Gaussian world that's not sufficient in my examination that person should say in the multivariate Gaussian world which is very different from X 1 up to X D or Gaussian co+ will tell you so in the multivariate Gaussian world independence a zero correlation or equivalent multivariate so really explained that ten times and I do it because so often people forget it when they outside multivariate Gaussian is much more important to give you a multivariate model the multivariate Gaussian you saw it it's on your paper or the components are Gaussian a very different world big big area of confusion in finance unfortunately still today it's so easy okay so we discovered that how do you prove this transformation theorem if you know your multivariate analysis is in analysis we learn alysus you can prove it so it's not difficult okay here so this is the Mexican hat from above you see the correlation gives you the slant of the diagonal here so in these are example so I come back to the tea later I've not discussed at tea it's how do you Sam pillai already did it I have a problem so I always take you there but I have a sample if I want to sample from your density it's a problem it's a big numerical problem if I'm sample my definition which is equivalent it's immediate this is a sample I gave you for this key all right now there are many nice results I will just mention them the proves are fairly straight forward transformation theorem etc etc etc or characteristic functions first of all linear combinations of a Gaussian are Gaussian multivariate Gaussian if the vector is multivariate Gaussian than the linear combination is Gaussian I once had a very heated discussion and this time I will not mention the name the person but whatever a very famous finance professor from the US and I said well if x1 is normal and x2 is normal I have no idea for x1 plus x2 we said the noise normal I said no I said no it's the basis of finance it's wrong well you can see immediately if x1 is standard normal Z and X 2 is -8 the sum is 0 so it's obvious if x1 x2 is bivariate normal then the sum is normal that's this theorem evening method form very important so in special case if I've got a multivariate Gaussian and I hope the variate multivariate then every linear combination every portfolio built on the Gaussian is again Gaussian and of course the parameters transform like means and covariance transform for every be a wonderful result which unfortunately fell off the slides the converse is true if you give me a vector for which every linear combination is one dimensional Gaussian for every vector B then the Joint Distribution is not a variant Gaussian that lives under the name crummy old device I will not say more about that I call that and I've no I these names are giving a bit of a joke but right they have a pedagogical I said this is surely the fundamental theorem of portfolio analysis because it tells you if you want your portfolio to be Gaussian for all possible combinations long and short they want to be Gaussian then you must start from a multivariate Gaussian model and vice versa so these are that bit of a joking way I say this is the fundamental theorem of portfolio analysis in the multivariate Gaussian world all portfolios or Gaussian very very important result we will be a bit more general later the modules if a multivariate Gaussian the marginal Gaussian quadratic forms is like if you take iid standard Gaussian which is like a multivariate Gaussian with identical covariance then the sum of the squares is Chi square well if you take sum of squares of multivariate Gaussian components you also find the Chi square okay so the Chi square distance and this you can do for testing QQ plotting and testing normality so for testing them just quickly mentioned these things so marginal distributions even at the victor levels these are vector these are vectors at the federal level they're all Gaussian linear combinations marginals conditional distributions even testing you how would you test for multivariate normality that's not so easy and Alex already spoke about it think about the data because if your data are it's not written there it's presumed there if your data are by definition highly skew perhaps you should transform them first if it doesn't work if there's a discrete component well surely you cannot have multivariate Gaussian T once everything is roughly right you looked at the data it's a bit symmetric one-dimensional it's okay then you can look at multivariate cases and you know that if if we have from multivariate normality we're already mentioned this result there's various things you can do a lot let me go here you can do QQ plots for certain linear combinations you see well okay there tests like Shapiro Wilks there's tests like Marge's test is a directional Mardy I wrote a famous book on directional data is that so now you're really in the world of of multivariate statistics they're the the RK Berra test etcetera etcetera so that many tests that Alex already mentioned I will not go back to that they're all related let's say to the chi-square distribution should be there somewhere so they're various ways you can test the marginals you can test some linear combinations you can test bivariate you can test the sum of squares but none of these tests all put together are equivalent with multivariate normality you know I think I'm correct there Alex it's not so easy to get an equivalence they're starting there okay so we're nuts if I can test up to several dimensions several combinations I'm pretty happy also if I understand the data now comes the important example remember that that the BMW Siemens data these were the data now lets you start thinking about just geometries don't look at the left picture that's like saying don't think of a pink elephant of course everybody is looking at the left picture now look at the right picture these are the data these are that the daily rock returns of BMW and siemens over the period alex discussed here you are data geometrically what you see in this plot what are the geometrical features there are no wrong answers here but you see in this plot yeah yeah ellipse where do you see the ellipse in the middle okay so we the center there is this elliptical blob here that looks pretty elliptical okay what else you see well you've got very very mathematical I shrank what Frank is saying we see this code of this increasing trend it's correct what else do you see ah well again you got mathematical eyes so where you see tail dependence left bottom corner now you see here there's a lot of action together what else do you see now it's a secondary thing sorry heavy tails is here yes what you see here nothing much so there's nothing much having in the whatever northeast South weights whatever corners how do you compare this in that it's Q now you all go home and I say aha I'm going to find a model that gives me a central elliptical blob a bit of correlation thinking in there in what corner shows me strong tail dependence the other corner is much less I don't want action in these corners and you come back tomorrow with me with a nice for silliness model exhibiting that but simulus means with little parameters we're not there yet I would not do it not there yet forget about copulas okay that's the job I have now you can say perhaps the Gaussian model does it know the Gaussian model does it know that's left-hand side now you can open your left eye and look at this picture this is the best-fitting Gaussian by variant Gaussian model maximum likelihood fit with it on this you take your data fit the best estimate mu and Sigma that's the best fitting we do get the elliptical part we don't get the extremes remember to the extreme of 15% if you're a risk manager and you loop your Gaussian model and you want to stress test for you these crisis would never exist and now we get again the story well once-in-a-lifetime word universe it doesn't doesn't capture it so the Gaussian is out no way this is the best Gaussian fit you cannot improve it there's the best fit it does not from a risk managers point of view capture the essential now comes the exercise so I will give you the solution so you can't go out tonight talk to to oh she and enjoy yourself I will now make a bridge from this model to that model in not Clayton couplers in a pass see - way I want to give you some very basic intuitive trail building model building to go from left to right parsimony this is the famous statement by George books about building models and models are wrong homos are wrong somewhere I'd look at it more carefully at this quote he says something about parsimony we should build models with as few parameters as possible that really described the main aspects here's the statement about if you give me three parameters I can model an elephant if you give me five parameter so I can wiggle this trunk that's it the famous statement about George box well now I can do the testing the left plot Gaussian the right block clearly non Gaussian the real data the inverted is the second margin highly non Gaussian we know that the simulated data are very Gaussian of course it is Gaussian because it's simulated from a Gaussian model we have done good simulations he was saying he was saying earlier let's see whether the theorems are correct you know I nearly jumped up and said well we proved they are correct of course right he really meant you know I know yeah it's always good to let the simulation see did the mathematical mathematician do a good job so you see now we look at the sum of squares there should be hi squared do we do Christ great degree of QQ plot the real Gaussian data they are fine no hospitals the real Gaussian data are fine the others of way of Gaussian T so no way and that's enough I don't have to do beyond that I don't have no more form of testing unconventional Gaussian I'm going to build a bridge from the Gaussian world my based Gaussian fit to reality I want to keep as much as possible my old model because people think in terms of Gaussian T so I want to keep the advantage of the Gaussian I want the EZ influence I want to be the driving parameter should be still the meaning the covariance or matrix Sigma it's like keeping the volatility if I can perhaps I can't in extreme value theory I couldn't you know mayor can so let's try to keep that I won't they have linear combinations portfolio should be easy easy distributions the Gaussian has it this is famous theorem in your combination I should have marginal distributions I know it's normal here a set of conditional distributions I know it's normal here quadratic forms are known convolutions are normal sampling is straightforward by definition it's straightforward and independence and uncorrelated nests are equivalent wouldn't it be nice if I or you can give me a model for the real data that has as much as possible of the previous advantages that would be cool that's model building rather than starting to Biddle the model in Germany said the green Ibiza you just start building without thinking think about what you want to keep in your model and then see how you can achieve it let's do it okay well the drawbacks I don't have to repeat them the drawbacks of the Gaussian are mainly the whole dependence is described by the covariance matrix that's two perhaps okay in this case I've got a drawback are definitely the the tail heaviness it's it's just not heavy enough in the tail it's very symmetric we already saw symmetries also so it still having a symmetry etc etc this is the second most important I'm nearly there with a little story of the big investment bank in New York this is my definition I said well I go back to remember if there's no W here I ask my Gaussian model remember that's how I define Gaussian t that's not your definition but mine or his and now I just do a slight change I pre multiply this construction here by a random variable scalar one-dimensional independent were the other randomness here I only do that the square root is there later we'll take squares whodunit just W not important to square this is just a positive random variable later we will use the so called inverse gamma or inverse chi-square a standard we will commit some but some one-dimensional random variable so you could have your model you can have your model you can have your model you just change the W okay you can do whatever you like but W is independent of the set the main point being is I'm not walking away remember this bridge today the the normal data reality I only act on the normal model this slight change here very slight change that opens a fantastic universe now what is the intuition behind this this is like you and you add a random shock to your covariances if you can still interpret it as your covariance is a a prime the W is a number to random number it goes as a square in your covariance matrix but you shock all your elements by the same number it's like you build a model now where you randomly shock all the elements of your covariance matrix clearly you get dependents because now you suddenly have this common shock for all the covariance I hope you see this it's not ok just do the calculus right so this is a positive random variable the special case if I go to degenerate if it's just a constant I give the old model back and so it's you can interpret the shock affecting the variance as you see the W shifts if I take the conditional distribution just what I said the W shifts here if you goes with the square into the covariance matrix so it's it's it's just a random for every time you sample your Omega you have a new shock in your system but of course the shock is determined by the distribution of W now I will not spend too much time on the calculus I could and do all the calculations let's just summarize them because I think I really want to go to the plate the more applied stir we have till 12 isn't it yeah so now the problem is for those who know stopping times there's no stopping time here should have a stopping time here I think I don't know what I have to stop we'll see just cry out okay now you can do many calculations you can work out the expectation is the expectation of if it was out here and if you define Y to be the original variable the Gaussian you start with multivariate Gaussian you can calculate the mean you can calculate the covariance you need finite mean of the W and you can do the calculation here comes one little detail the covariance of your new model X it's not the covariance of your old Sigma Y holds it remember a a prime with the covariance of that component your Sigma clearly these W entries or the mean of the W of the variance and so you'll see it here so you get the mean that's why you agree to it we took a square root so bit careful we had this yesterday when Alex set the new of a new mine is too carefully in these models if you go from one to the other there's a slight correction on but the correlation this factor of course that the correlation drops out so the correlation of the same isn't really good it's not bad because I like to thinking well some people like to think in terms of correlation well the two models are the same correlations the A's it which is your multivariate gaussians you start with and the new one I haven't given a name yet immediately the characteristic function can work out now here I'm a bit hesitant to go too deeply because this is later important if you calculate the characteristic function you see some structure comes out and now of course this W enter this is a number 12 for instance this people called Laplace transform so the Laplace transform which engineers also know very well positive signals the Laplace transform of W in this point here becomes important that helps us to give a notation for this clause because I've not given a notation and I do it here this new vector now gets a name from me the new vector is a multivariate mixture that's an M mixture of gaussians because I mix gaussians okay that's the W construction here is is that's called mixing okay I'm mixing input parameter mu input function Sigma Sigma is not the covariance of X Sigma is the covariance of the input multivariate Gaussian okay it's my difference and the W plays a role you can I could have written here W or there's distribution of W but it's slightly mathematical better if you think in terms of the notation wise in terms of the of the Laplace transform but forget about it that is not an issue to be slightly an issue later if we go to yeah whatever later you do elliptical the elliptical distributions because then the notation becomes a bit more involved but again for the moment now comes a little exercise it's very easy it's immediate let's now geometry let's look at the density now I look at kind of your paper what have you worked out as density of this well here it is what's the density of this vector well it's the definition the density of a random vector of this ik in this thing is a conditional density given W little W and you integrate out the distribution function of W that's just a standard way of working with conditional densities this may have a density there are right a density here or to be pedantic this is still choose but never mind so this is just the definition of conditional density okay now I know by definition if I if I condition my W to be little W then its Gaussian okay because this was a mere different definition okay if I take my vector I condition it to be fifteen look at it what happens again it's it's multivariate Gaussian so it's a multivariate Gaussian I can write it down no no W is a positive random variable I take a square root no W is always positive so I integrate for W ok so but it is immediate now this density I know is the Gaussian so what's the geometry of this distribution now I look again at a Mexican hat from above I cut it sorry elliptical it's much more complicated but there is only one point where the X dependence place here if I put this integral equal to a constant this must be a constant here in X that's the only place I can have the X that's an ellipsoid it's an elliptical distribution oh that's great because you wanted to have these elliptical blobs there so I've got it now you don't know that yet wait Frank wait he says you don't have the extremes I don't know well I do know check alright what is my freedom I can work with is a W for any W I've got this let's go get some w z-- well first of all these things there's a lot of properties of linear combination marginals are just leave it is an important one all linear combinations again belong to the same class the parameters change this is good because this you need for portfolios so that's a very good result so linear combinations stay within the class the parameters change so that's very good simulating why should I bother you know how to do it because this is an explicit definition I'm nearly with my story there I come back to you because sorry this is a definition how do I simulate I first simulate D independent standard gaussians if you want the given Sigma I took the whole disk e decomposition to find the a a prime I multiply my diid gaussians with a now you come to me with your W let's say you take W is a feast chi-square D oh this is easy I simulate a new set of D iid random variables Gaussian I add them up square them thats Chi Square D so on life can be easy or difficult depending on the simulation of W but is a one-dimensional random variable so you give me your favorite model and I simulate for you I'm there it's done it's trivial even and I'm not a computer buffer I can program that I think but don't test me all right it's strange what why is it straightforward because I did very careful construction of Parsi - model building okay now I'm almost every story no I can take many examples if the W equals one is the multivariate Gaussian nothing new now you can look at models where you have kind of stirred small mixture models I can take W as a two point distribution remember the W stresses the covariance hi covariance high volatility if you like in some general sense low you can take a two point distribution a five point distribution you can take a very strange combination of different aspects of Gaussian world's mixed together you can do that I will look I will not discuss the hyperbolic they are used quite a lot especially in high frequency data I'll look at this now I will take a very spiritual example and this is an exercise you can do often if you do exercises it helps you to slow down in your mind and think about what you're doing finding the result is always nice but by thinking let's say 15 20 minutes half an hour depends on your calculus you're thinking what you're calculating let's take W is 1 over V where V is a gamma distribution it's called an inverse gamma it's also a chi-square inverse cos square an inverse gamma DS absolutely standard distribution within degrees of freedom or a number you different okay the mean exists if you calculate this new for new - - if new is greater than 2 then they followed variances so then the covariance you can calculate it but this is just an example right this is trivial to simulate from because it was to simulate from a gamma and if nu is an integer and even integer it's just a chi-squared type of simulation so back of an envelope and so you can simulate easily from this thing here now do the calculation you really have to if you do it once it's nice or you ask you if you you have children at school it's also your son or daughter to do it they just have to know how to integrate plug in here for the W the density of the gamma and do the calculation if your son or daughter or nephew or niece has done it correctly they will come up with this here if the gamma the gamma function is coming up this is called the multivariate t we already saw that yesterday right that's a multivariate T now comes a story this must have been mid-90s a famous Investment Banking in New York called me and say oh well we're starting to you cut non Gaussian models have we used now for our data modelling we used a multivariate t and like in York University figure they wrote written down the density so how would you simulate from this and I should have said well let me think about it is an interesting question but I was rather naive and I immediately said well it's trivial this is the way you simulated so this year and I told them which is across you should never do that if a big bank phones you say well let me think about it I'll come back to you in a week's time you know I'm just joking but it just shows you simulating directly even with good numerix from this distribution is no trivial in verse house Mario's but if you know that this interesting distribution this is one of the workhorses of Finance nowadays the tea some people say the whole world is about tea there's even books about that a cat platen the world model finance then you must simulate from this but if you know that this density this random vector has this this this representation here here it's trivial you can just simulate it that's what I call for seamless thinking that's what I hope you learnt in this particular thing now comes and I pressed Mari used to do that it careful perhaps a couple of hours work one evening that's it now I want to go back to my data these were the finance data BM VW I remember elliptical blob extremes a bit of symmetric or a symmetric sorry for that the best normal fit nothing get in there the right-hand side is the best multivariate Gaussian fit or three degrees of freedom we gave it three I don't think we estimated you can estimate a tree it will be very close to the tree well it's a let's say the best fitting best fitting team model this is not bad what's good about it I can ask you but you'll see it what's good about it I getting of course I'm not predicting these events but I get a right range look - teen my - 10 - 15 my model captures 15% drops in this share prices between these two German stocks it also gives me and Frank is already jumping up and down it gives me a bit of symmetric in the right tail it's clearly symmetric it's a symmetric model it's elliptical it has ellipsoidal center but not Center it's an elliptical distribution and it satisfies all the nice properties I have now you can say and Frank can say to me well this is not good enough I want to get rid of these points here I want get skewness in this tale here extreme events here but not there but at least I really hope and if it's not then I have failed my duty I really hope that you understand how you should think about model building beyond well-known standard models you start with your standard model you think about possible two changes you do changes which you economically understand the mixture modeling your stress test your covariance entries what is independent random variable you find both these are beautiful properties I'm not fully there yet but believe me I think I forgot now there's a paper by Philip Jovi on on the 1987 I think or LTCM crisis LTCM in 98 were also it was said what happened to LTCM the hedge fund the drop down was a once in so many years experience it if you do just a multivariate t models I think it becames with reasonable parameters like this I think it became one in 70 year so you already get qualitatively estimates in the right ball game if you now say well but this asymmetry which are seeing the data I really have to capture then you go to the next one does this disorder no it's okay let's now add an extra component now also randomized the mean is it is very different from copula modeling and that's why I'm a cop let's come later but couplers are fine and I will tell you how fine they are later but this is different I don't think of Coppola so I think of economic ways random mean random shocking microfarad that I understand economically there is a Coppola behind this there's a t copula the joke comes later in the course about the t copula but that's the next stage believe me if we now do it rich good choice of the mean and again simulation is easy I get the skew models you can dem excuse you like so that's the way we do I will not read that we can do the calculations we won't do that I think I'm supposed to stop here and somebody is taking over now okay so we've learned about a multivariate normal model and extensions as the Multi Verity with a factor of square root of W with a stochastic representation essentially right this is what Paul stressed you have to look at stochastic representations to extend extension 1 then sampling is easy and so on and a different way of getting to a model even more general class of models than the normal bearings mixtures you have the square root of W is is to get to elliptical distributions why do we want to study this even more general class because in Chapter eight I believe we will have a result that for all X for all random vectors X following a multivariate elliptical distribution Value at Risk is sub-editor now so that builds a link between chapter 6 and chapter 2 essentially so that's why we are interested in this even more general class of distributions now the way this class is constructed is slightly differently than going from the multivariate normal to the multi very team and it's easy to stop it's easier let's say and also essentially the classical way to introduce elliptical distributions to first study a subclass of elliptical distributions and that subclass our spherical distributions and the names will become will become clear soon so what is the spherical this should be and again we look at random vectors we don't look at densities here so if spherically distributed random vector Y is a random vector that is in distribution equal to a rotated or reflected version of itself so it's essentially invariant distributionally and the rotations and reflections now you can go home you didn't learn anything how do you let use how do you think of such a random vector Y what is it how would I how would I simulate it now so that's not given in this definition unfortunately no but you can already think about the geometry in the sense that it must be like how would the points look like if I sample it they have to be invariant if I rotate them and you or even reflect them so maybe something like a circle would do right yeah that's where the name comes from the sphere essentially then in higher dimensions but do dimensions that that's okay now there's a very important theorem and of course it's it's proven in the in the book about the characterization of spherical distributed random vectors of spherical distributions and says the following are equivalent so first my vector Y is following a spherical distribution if and only if there is a function called the characteristics generator with which I can write the characteristic function like that and essentially you see the characteristic function a bit of the same considerations as before with the density characteristic function is constant on one spheres on circles now the characteristic function didn't play a too big role so far and it won't we rather look at stochastic representations at the random vectors but if you want to prove the properties we've a little quickly discuss about elliptical distributions then there's no way around the run characteristic functions typically they also typically give you the shortest the shoulders truthful for properties and so this is what what we what we will not encounter too much anymore but the third part so essentially reading why is spherical if and only if this result holds this is very important because this tells me aha why is spherical if and only if every linear combination is in distribution equal to some constant being the length of the vector times one of the components of Y and the only thing to take away here at the moment is the mathematical or probabilistic beauty of this result namely that I can rewrite something that involves a multivariate random vector or something high dimensional this distribution by something that's univer it by a single random variable and that's extremely helpful that for many purposes also especially in in Chapter eight and where we learn that for all elliptical distributions Value at Risk is sub-editors and the proof is actually it's not too complicated and the proof is exactly based on that result now so it's extremely beautiful probabilistic result essentially playing back the multivariate case to something univariate in distribution so very very helpful result but still all linear combinations being in distribution equal to one of them still it's kind of a recursive definition if you like I still don't know how to assemble from that I still don't know how to think about spherical distributions now I would like to have something like Paul represented a stochastic representation so this is very important to Bob's stochastic representation now I want a representation in distribution of my more complicated high dimensional random vector in terms of simpler components components I can simulate independent normals and transformations with the cholesky of and so on and also simpler components and here is the result so it says that random vector Y is spherically distributed if and only if it's in distribution equal to our times where R is a positive random variable similar to the scroll of W before and s is a random vector so similar to your set before but this is now distributed uniformly on the unit sphere now and there are ways to generate points uniformly on the unit sphere that's not very complicated you can use your set vector from before you will see that in a minute but this is a very nice way of thinking about spherical distribution so I essentially take points uniformly on the sphere and then for each of them I take I built a line between the origin and that point and I scale that I would randomly by the display realizations of our of that distribution of our and yeah so that's that's Fearghal distribution in the in the bivariate case is of course a circle how could you immediately generate a unique points uniformly on the circle just take a random angle between 0 and 2pi and take the corresponding point where you are so that would be that would be an easy way to generate points uniformly under unit circle in the byberry case and then you take let's say an exponential distribution and you simulate as many points as you have under circle and then you scale them out and and you will have a spherically just to be with it yes of course it's easy there's an easier way to sample from the sphere in higher dimensions and people do that now in the segment yes yes you have to it it is it is a very good question to show that the method works I would take characteristic functions which we don't do but yeah you will see that it works I will show you these plots in arm now here you see I started exactly with I can only plot in the bivariate case I started with points uniformly on the unit circle and then I took as the radial part F distributed random variables I multiplied with D I take the square root and so I scale each point out or further in depending on my realizations of our now why did I take such complicated distribution function here because that's exactly the distribution function of the radial part that gives me a sample of a special case of the multivariate distribution now these these spherical distributions they can reproduce special cases of the multivariate distribution you've seen before in that such a case the spherical distributions are no channelization yet or foil purported it's not so easy to build a link between the two but we will look at the channelized version of the spherical distributions the elliptical and they will be of the yes [Music] [Music] independently of each other that's that's of course if you take the very same out but you would get is simply one bigger circle or one smaller circle right I do these exercises with you in our soon so yeah that's just a multiplication this is a bit of abuse of notation you of course this is the distributions are example from that and then that sample I multiplied with T and I take the squared it's it's all in our it's all they'll give me a second clear there now in our I actually but we come back to these pictures I slightly take a different direction so you see one set of constructing a sample from the magic very elliptical distributions on the slides and then one version you have four now because it's I can first multiply with the radial part and then do later under Khan the composition I can do it the other way around so there there are two versions now good so we go from the circle to this spherical to these spherical shapes but of course the point is when do you see such a data and Paul showed you the BMW Siemens data before you never see circles as level curves in your data now so this is just for us a building block for doing the next step going to a more flexible model and that make more flexible one of the elliptical distribution multivariate elliptical distributions now if you know the spherical distributions and if you studied characteristic functions to derive a lot of nice properties about about spherical distributions things like the the covariance of the uniform distribution under UN sphere in all of these things then you can all use them as a building block to understand elliptical distributions and properties of elliptical distributions because to go from the spherical to the elliptical board how do I go from the circle to the ellipse well I do exactly that linear that f-fine linear transformation so that's a very simple step then all I do is I take an A which is the cholesky factor of some covariance matrix i've also want to stress that some covariance matrix not necessarily the one of X because the radial power plays a role as well now so I take this cholesky factor I multiply it with my spherical random vector and then of course there can be a move where I shift my whole random vector a location parameter where I can shift my whole random vector somewhere else now and the language is very important here I always stress when I teach that I always stress the difference between a mean vector and covariance matrix versus a location and the scale we speak here in terms of a location and the Sigma wouldn't be a scale matrix but those are not necessarily the mean of X and the covariance of X now if the mean of X exists then it's indeed equal but it doesn't have to exist depends on the radial pulse and if the correlation matrix of X exists then it's equal to the correlation matrix corresponding to that covariance matrix but the covariance matrix of the covariance matrix of X itself again if it exists depends on the Sigma plus but not last but multiplied with a factor and that factor essentially contains the information that essentially of the radial part as well and so the radial part has to come in at some point and you start the same before when Paul talked about the square root of W and the covariance matrix of the normal variance mixtures they are you have the square root of W in the covariance matrix as well you saw it for the multivariate t there you have the new / new minus 2 and that's essentially that factor you also see you so very careful with with mean and covariance you good now now I should prove to you that what we do now is indeed a generalization of what Paul did earlier with the normal variance mixtures and so first of all I can take my stochastic representation of Y being our x s in here and get a nice domestic representation of an elliptical random vector this is extremely nice for us right axis just mu plus R times a times s now so this is my stochastic representation and to show that the normal bearings mixes our special case of the ellipticals all I have to do is I take the stochastic representation Paul gives me and I try to rewrite it such that I get it in this form because then I'm done then I know this class of distributions is the subclass of the ellipticals and the way to do it is here you have your normal variance mixture silastic representation you just pull out the norm offset and then this will be you random positive random variable so that's your radial part and set divided by the norm of set can be shown first of all to be uniformly distributed on the unit sphere yes that answers your question Frank this is exactly how you would generate from a uniform distribution on the unit sphere and one surprisingly but I don't want to go in detail here want also show that our nests are independent so this is really exactly the stochastic representation of an elliptical so all normal variance mixtures are indeed special cases of elliptical distribution very beautiful results here and so now I let me go back for a second to that slide so we did the first step we started with s we went to the spherical distributions multiplied with the radial part such that we in the end hopefully get a T this is already a special case of the team and then I do the location scale transform so my scale is now the cholesky factor so you see how it turns from from from a spherical shape i into an ellipsoid the sample cloud and then i move that to the upper right corner with a moon with the location at middle now so and this looks now much closer to well this is actually multi very simple now yeah just constructed through the idea of of sphericals good and then there are a lot of properties which you can derive by elliptical distributions the bottom line is that most but not all properties of the multivariate normal do carry over to ellipticals so for example I mean you've seen special cases obviously now you see Norman barons mix just our special cases so you've seen these properties before so the density can be written in this form and again you see that the level set of the density of our ellipses essentially now and the hence the name elliptical so this as well so what we what we see in the data and then linear combinations can be shown to be again elliptical here's your famous result that now is the most general result for having known distributions of linear combinations of random variables so if I have a random vector which is multivariate linear then all linear combinations are univariate elliptical so this is now the same obviously as for normal bearings mixtures this is especially the same as for the multivariate normal distribution and if you go back to chapter 2 but critical yesterday very quickly looked at it this is underlying the variance covariance yeah for for approximating of computing you're your boss distribution function now so there we had two assumptions X was multivariate normal and we approximated our loss operator by the linearized loss so I have a linear combination of the X's in there and then my hope is that I know what the distribution is the lost distribution and from there I can calculate value a Tristan expected shortfall explicitly and that you can do even for this more general class of elliptical distributions so now you know the variance covariance method will still give you explicit risk measure estimates even for Apple distributions then then the normal and this is about the yeah the largest class you can you can easily get good then the marginal distributions are again elliptical quadratic forms are in distribution equal to the radial part squared and that can be for example used for goodness of fit testing so I can recover this from data essentially and then test it against my theoretically assumed distribution underlying else I could use a QQ plot for example something uni varied again which can be very helpful in checking whether I actually have data from a multivariate t if you give me data from a multivariate II this is one way I can do a very simple univariate QQ plot goodness-of-fit test now of course I can also test that formally with many of the tests we had in Chapter three essentially yes this is a standard [Music] because your Sigma is diagonal with the sum of squares in the good so again it's quite a bit of theory here I would like to challenge you a little bit also challenge a little bit what Paul said earlier concerning sampling you saw on the slides one random back door but in practice you want to have n realizations right and this is actually not so easy it's actually even for the multi Verity so critical the partly numerical issues partly the way all works on matrices and so on that I wrote a paper about this so it was even both a paper that's in the AL shown there's a paper on sampling from the multivariate distribution very nice with the stochastic representation of course same notation as you read it it's a very nice one but I go very quickly through with you with yeah an hour of course with your building these building these multivariate distributions good so they asked the script it's called constructing and sampling multivariate distributions there's a little bit of playground concerning building a correlation matrices so you start with the cholesky factor and you go to the correlation matrix and going back so you can execute that there's also function of getting the correlation matrix from a covariance matrix I do that manually I do that with a function R and so on so it's not so important for us I look at other deep compositions as well here but here the magic starts so here we start with the multivariate normal so I want to have a thousand samples in the bivariate case that can easily blot that then but the code is written so that you can easily use a different dimension here as well and I sample that set and here you already see the difference - what's on the slides it's not just D sets it's an end by matrix essentially and of course if you sample iid it doesn't matter how that matrix is felt but B well it's very important to realize that in our matrices are filled column vise so and actually it can happen that instead of sampling and instead of sampling of variants mixed normal variance mixture you accidentally sample a mean variance mixture which also mixes the mean which is not even elliptical anymore now so so be extremely careful here test test the samples you you get so this is essentially in a thousand independent Norman's and then I want to put some covariance matrix on top now so I get I get this shapes I go through fairly quickly but yeah just highlight the most important aspect then of course I can move that somewhere else right so it's just my mood essentially now I can even make the correlation smaller now and overlay that sample so you see of course the the smaller mic relation or the larger an absolute value actually that's the right way of saying it the more my samples lie on a line right this is what Paul mentioned correlations in plus or minus one almost surely they are lying on the line so this is what we what we have to so this seems to seems to work as well I can switch the sign of the correlation matrix yeah so I simply get the slope in D in the other direction and I would like to start from this very same sample and now go to the mighty very T by bringing in that square root of W Fecteau so I sampled one over a gamma by the way this is how how these samples look like in in log scale and then I multiplied here the square root of W to a times set but you see what here in the computer I need to transpose my set then matrix careful matrix multiplied with a transpose that again multiplied with spirit of W and then repeats the middle location vector correspondingly now so until this is a vector this is a matrix but our matches are repeats the recycles the values correspondingly I just have to make sure I do that in the right way I have to repeat the move n times otherwise my location let's say 1 3 will be repeated column wise 1 3 1 3 and suddenly each other sample comes from a different location I will not get an elliptical distribution the extremely careful here good so I can do that and then I get you see the normal the black points and then overlaid the T and of course the beauty is that my my W is is not bounded now so I reach out much further than any other normal distribution and but of course I can play now I can say ok I want to have a multivariate G here an independent one the black dots we see almost a cross here now you almost see no points in these corners there's no dependence but then I want to have an uncorrelated T remember for the normal independence equal to uncorrelated nur's for the T not because you have the square root of W there the only normal variance mixtures well independence is equivalent to anchor relatedness are the ones where your square root of W is the constant and those are only the normal yeah so you see more points here you see the spherical shape so that's actually a spherical if you like and then the fool T which introduces the red points which introduces dependence in two ways first through the spread of W and then through the correlation matrix now so this is how these examples differ but let's now quickly in one minute let the fun begin so I would like to look at a square root of W where the W only takes on two values so I kind of see a little bit more but what's going on so let's do that and then what you see is so the black dots are my multi parity and then here you see samples from normal again but here the W is one and here the normal with W being two and with probability one-half they are one or two so you see half of the samples in the blue cloud and half in the in the red cloud and you really see how the T is now constructed essentially you overlay normals with different realizations of your W so you overlay normals with different covariances but if you only use two different values or any other distribution for your W which is bounded essentially then that bound is essentially the max correlation you get you will not reach out further but the beauty for the T is of course that the distribution of the W is unbounded so you do reach out much further every once in a while now so this is kind of the beauty behind us and then I can also look at mean variance mixture this is now what I mentioned before I can also mix the mean so half of the sample points have a different location essentially and then you see overall they are not elliptical anymore but play around adjust the probabilities to do all of that so it's all they are tool to play around good and then I can look at the elliptical so I start from the spherical as you've seen on the slides before I can now first add the cholesky factor before I first added the radial part essentially now I first add the cholesky factor to see what it does to my to my circle and it tilts it so you already see the elliptical shape and now my radial part essentially generates as many points as here and scales out each of them with the corresponding realization and then you will see the elliptical shape of oil so if I now add and add really means if I should really mean multiplication so if I multiply without and each point gets out scaled out in a different way or possibly scaled in right so and then of course I can also add a location and move the whole of the whole sample cloud and also I can look at a two-point distribution for for the radial part and you really see that well you have to circle before and if I only have two radial part two realizations of the radio part is two different realizations then you see that I only get these ellipsis right so as before with the with the nested circles essentially now so again an elliptical distribution just overlays these ellipses for different radial parts and of course if the radial part is unbounded then you get further and further out so yeah that's essentially it enjoy your lunch okay so I think we will slowly make a start okay so in the first session after lunch you're gonna get 45 minutes of me talking about dimension reduction and then we're going to change the subject and start to talk about couplers independence that will be really good this morning in the multivariate modeling chapter we talked about models for random vectors of dimension D we talked about normal mixture models various mixtures mean variance mixtures and also a little bit about spherical and elliptical models one comment I will just make is that these mixture distributions are remarkably easy to estimate and some of the material we have in QM tutorial for Chapter six you'll see that there's about seven or eight scripts for Chapter six one of them is about fitting multivariate mixture models where there's student distributions or generalized hyperbolic distributions it's very easy because of their structure they are scale scale and location mixtures of multivariate normal and in a sense they they make use of the convenient estimation of the multivariate normal and are not too much more difficult to estimate themselves using algorithms which belong to two set of methods known as e/m algorithms okay I will talk about factor models the dimension D of our random vector may be extremely large if we're thinking about all the risk factors that affect a portfolio D could run into the I don't know thousands of tens of thousands even if we restrict to something like portfolios of bonds we may have a high dimensional situation because we have so many different times to maturity the reason we're doing this section is that we will use it in two applications on the remaining three days tomorrow morning we will use dimension reduction on bond portfolios and on Friday morning we will use dimension reduction in portfolio credit risk models so the factor models will be presented today as a tool and then it will be used in two applications so the idea is to let see okay so our random vectors have been called X are vectors of risk factors the the P factor model is an attempt to explain the randomness and a d-dimensional vector in terms of the randomness in a lower dimensional vector of factors so f stands for factors we may want to go from tens of thousands down to dozens potentially so from tens of thousands down to to dozens possibly even down to one possibly even down to a single factor in certain applications so the the factor model for a random vector looks like this it could be called the linear factor model so you have various elements B is called the the loading matrix or the weight matrix this hat was this is going to have D rows and P columns so as many rows as there are risk factors and as many columns as there are common factors at this point it becomes confusing to talk about risk factors and factors but we so we're called this is the variable X and these are the factors F the dimension of F P we will when we need it we will call the covariance matrix of F we'll call it Omega in general that might not they might not be independent or orthogonal or orthogonal factors but they can usually be orthogonal eyes door come to that epsilon are the errors in the factor model they're sometimes called the specific terms of the idiosyncratic terms and they're supposed to be uncorrelated with each other so this vector of idiosyncratic error terms we'll assume it has mean 0 and whatever covariance matrix it has in the factor model it should be a diagonal covariance matrix so sometimes when we encounter it it may be called this is Epsilon one of the lesser used of the Greek letters lemons Epsilon the factors and the errors should also be uncorrelated the factors this part is sometimes called the systematic part so we want to use this kind of model to reduce dimension and I'm going to mention three strategies for doing it and I will implement at least one and if I talk fast enough possibly two strategies for doing the factor modeling if we assume this structure for the random vector it's actually the same as assuming a structure for the covariance matrix of X as you will see so so perhaps go to the second point first I mean the factor model implies that the covariance matrix of X has this special structure you have the loading matrix of the weight matrix you have the covariance matrix of the factors B again and then this diagonal covariance matrix of the specific errors you can as I've said you can orthogonalize the factors you can define an F star in this way and you can you can define a B star a matrix in this way and you can rewrite the make the model in terms of B star and F star so I've gone here from I've gone here from B and F to B star and F star the point here is that the covariance matrix of F star is now the identity so the covariance matrix of X will be given by B star B star transpose plus Epsilon so you can always orthogonalize the factors once you have them now the question is where are we going to get factors so our goal will be to identify factors to identify factors F T and then we will in instead of modeling all of our individual prices and rates all of our individual X information we will concentrate on modeling F that's going to be the point okay so the various strategies so to estimate a factor model we need data we will assume that at various time points we have X data so we will have a data set of size n we will end d dimensional vectors XT and we assume that at every time point the fact model halls it's customary to divide the strategies into three groups the names you see different names for these and I'll mention some of them two and three should have a different name I think this is one of the first typos we've actually come across this fundamental should actually read statistical so if you take a pen cross out the third fundamental and write statistical so beginning at the top econometricians refer to something called macroeconomic factor models also known as time series factor models this is a situation where you have some observable candidate for the factor so you might decide for example so in modeling stock returns you might take a stock index as the factor or you might take sector indices as the factor so you you probably know these models and econometrics or in economics cap M is a model of this kind a macroeconomic factor model where typically you take the factors as observed index return series or sector index return series and if you observe ft then the problem of calibrating this model or should I say estimating this model it will be estimation is to determine a and B and this is a regression problem this becomes a time series regression problem given XT given ft determine a and B now this I won't do more I want there is in the the QRM tutorial there's an example of each of these kinds of factor models the one that I wasn't going to do was this one but I was going to do fundamental factor models so this is a slightly different situation and here what we assume is that we observe B okay we don't observe F T we observe B we assume that the matrix of factor loadings is known and we attempt to estimate the factors the factors are unobserved so we attend to estimate those again it's a regression problem but instead of being a longitudinal time series regression problem it becomes a cross-sectional regression problem so these models that are quite popular in fund management where you look at the fundamentals of stocks such as the the country the industry sector and the size large-cap mid-cap and small-cap so people working in fund management sometimes call those fundamental factor models because these are the fundamentals of the stock industry country size so that that would be an example so basically we can classify each of the individual X's into groups groups which are associated with things like industry or country but we don't have a factor for that group perhaps we don't have a natural candidate for the factor for that group so we will actually estimate the factor for that group we will construct it so at every time point we will run a cross-sectional regression and I'm going to do this so that you see how it works the statistical factor models nothing here is observed in the statistical factor models we have neither the factors nor the factor loadings so we have to get both somehow we have to discover the factors within the X so so that for me it's quite a nice division here you observe F here you but not B here you observe B but not F and here you observed neither okay so I will miss out the first kind I will miss out the macroeconomic factor models and go immediately to the fundamental factor models and I will try and do that so at every time point we have a model of this kind and the bit that's okay okay now we're live again I think so B is known it's some classifying matrix for every one of these X variables and let's just take the example of stocks so we'll call the exes now stock returns and we know that the classifying matrix B so we have to observe F T we have to estimate F T we can get rid of a so going back to the more general model we can actually absorb a into F T so the problem we'll be estimating F T it's a regression problem so you probably recognize something which looks like a and which is a least squares estimate here so the first thing we could do is we could try and estimate at every time point t we could form the ordinary least squares estimate so this is the sort of matrix form of the ordinary least squares estimate if each of these errors error vectors or the components of these error vectors if they all had the same variances that's what homeless cadastre this would be the best unbiased estimator but in general it's not possible to observe that because these are different stocks at time T some of them may be more variable than others and so to get slightly better estimates you use weighted or generalized least squares but that's a detail really I think the important thing to take away is you use standard regression you start with ordinary least squares and then you try and improve it with generalized least squares and I'm going to do an example of that or attempt to so fitting a fundamental factor model this this has been resized since yesterday as the font okay at the back or should I make it one bigger one bigger that's that that's maybe better right it's a little bit better okay so for this I will need the time series library I'll use qrm data again I'll use the data that Maris and I have put together and the QRM tools library I'm going to take every single stock in the S&P 500 there are how many stocks are there in the S&P 500 appears they're actually 502 when we have a look at them I'll take the data from 1995 up to the end of 2015 so 5 min 502 there are five hundred and five in the S&P 500 so I have five thousand two hundred days and I have 500 stocks and I will do a monthly model so I'll go from daily to monthly and I will fit basically a cross-sectional regression every month there are a lot of missing values a lot of missing values and so unfortunately I can't take all 505 some of the stocks have an awful lot of missing values let's see if we get a picture here appearing yeah this is this is actually just the first hundred out of the 500 if you see a black bar this is time and if you see a black bar those are missing values so for example this one we only have data from here these ones we have the complete data this one we only have data from here so there are missing values I'm gonna get rid of every stock where I have more than 10% missing values let's say or make columns with more than 10% missing values and I will then fill in other missing values so I'll use a function which is in the XTS package called na fill which will interpolate to get rid of other missing values and by the time that that finishes running I will have I've got 374 so unfortunately a lot of these S&P stocks are relatively recent they don't go back all the way I won't draw all the plots if you want to see 374 time series you can cycle through this block here and then you will see 20 by 20 you'll see 374 time series for the standard & poor's I won't do that and I will throw out a few other things so I'm going to remove certain stocks the the ones which essentially collapse that which appear to be still there in the data set like AIG they I'll throw them out because they essentially go to 0 I'm removing the telecoms because this is a very thin sector the kind of fundamental factor model I will fit will be based on industry sector and telecoms is the thinnest sector there are only four so I've thrown them out not absolutely necessary and then there are a couple of names which basically are moved together so there's a CMS a and the CMC essay and CMC SK they essentially moved together so I throw out throw those out so certain certain are ignored just a few but I don't think that that's important for the purposes of understanding what's going on but by the time that I've done that I'm down to 365 so D is 365 300m it the dimension of my vectors is 365 I will take log returns as usual buildin and well we'll just build the log returns we'll plot them shortly first I will turn the daily log returns into into monthly so I'll take monthly data and I'll plot the first twenty okay so that's that's now the first twenty of these stocks here you may records that you recognize but yes these are monthly log returns for the first twenty out of three hundred and six the five now I need sector information I need to be able to allocate every one of these to a sector and this is the breakdown by sector I have major sectors and sub-sectors so the major sectors are things like consumer discretionary financials industrials health care as you can see and the sub sectors I won't worry about I will use primary sector as my classifying piece of information let's just go through a few commands that are not important and I will do the tabulation so over those three hundred and sixty five of those 365 stocks 57 are consumer discretionary 31 are consumer staples and so on 25 our utilities 63 our financials I will I'll shorten the names some of those names are quite long so I'll shorten the names and then finally in step 2 here I will fit the model so I want to fit a cross-sectional regression model every month and there's no intercept all that I'm estimating is the factor values for these nine sectors every month and I can do it in ARDS regression is so easy in R you use a command called LM you don't need a loop you can accomplish everything by putting matrices in each of these positions here so when I execute this command here I've completely calibrated the factor model and we can have a look at the information that I essentially use B is the loading matrix B is the loading matrix there are 365 rows one for every firm and there are nine columns one for every sector so when you look at this what it is is basically a matrix of zeros once so let us take have a look at the first title the Triple M that has that the primary sector of Triple M is here industrial it's an industrial the primary sector of abt is health okay so every one of them is uniquely attributed to a sector sort of classic fundamental factor model of course see these models are so widely used in the asset management industry you have the the Bera factor models or of this fundamental type for example what do these factors look like so every day every day I've estimated a factor value and I can now plot the time series of factors so this is my constructed factor for consumer discretionary this is my constructed factor consumer staples IT utilities etc so I built these factors and what I can do now is I can say I've gone from dimension 365 down to dimension 9 I will concentrate or modeling these nine factors in future and everything else hopefully all the noise all the idiosyncratic errors are second-order effects that diversify away and are less important from a risk point of view that's the idea the residuals I can estimate residuals going back to the factor model having estimated ft I can then estimate epsilon T those are the residuals as I've said the variances are unlikely to be equal so it's not going to be a situation of homoscedasticity squares the weighted least squares estimator is given here ok you have to basically estimate the covariance matrix of the errors oops Alon and actually there should be a hat here this should be epsilon hat rather than epsilon in what you actually compute but that's a small detail so this e PS variances are basically just computed my estimate of the covariance matrix of the errors and you can have a look you got I will sort them from smallest to largest so of there's the 365 names and these are the variances of the errors starting with the smallest variance D and going up to the largest variance whatever it was re GN so in a sense the the first titles are the ones which are best explained by the industry factors and the latter titles of the ones which are least well explained by their industry factors I think we could interpret it that way generalized least squares model to here the generalized least squares you just add weights and I can recruit I can re estimate my factors they won't look a lot different I will do that and if you watch the picture on the pictures on the right hand side I just changed the pictures it hardly changed you maybe saw a little bit of movement but those are weighted least squares estimates the factors are correlated here's the here's the correlation matrix of the factors here so some of them are more heavily correlated than others but you can see the matrix of correlations of these factors that I've constructed and it's probably nearly enough the errors should have a near diagonal correlation matrix so when I ApS is stands for epsilon so I'm trying to reconstruct epsilon errors so I reconstruct extract the residuals I call them epsilon you'll often see T's and you have to sort of keep your wits about you in taking transposes and not taking transposes but when you do it you you work it out so don't worry about these t's certain things are transposed other things aren't transposed but all the linear algebra works out the the residuals should have a correlation matrix which is near diagonal because that's the assumption of the factor model the factor model says that these errors should be uncorrelated so therefore their correlation matrix should be diagonal and in practice if it's a great factor model that I fitted my estimated correlation matrix should be near diagonal so I think that's the last thing I'll do in this example I will estimate the correlation matrix and this correlation matrix is 365 by 365 rather than attempt to show you a 365 by 365 correlation matrix I will have a look at the upper corner so the upper corner I have basically taken the first 75 stocks and that so this is the upper 75 by 75 corner and this is a picture of a correlation matrix a level plot and you see the scale here if it's white the number is at zero if it's red the number is going up towards 1/2 and if it's blue it's going down towards well it's going up so as one is going down towards minus one but this already tells you the range of these numbers evidently there isn't an estimated correlation bigger or much bigger than 1/2 or much smaller than - no point three so if you so I you see what I did there I just put n A's on the diagonal because if I if I put what you get is a you get a really harsh purple line on the diagonal we don't we don't want to look at that we don't care about the diagonal so I put some mayonnaise in there yeah okay yeah don't worry it's it really is one but I could Samana is there and so what do you look for are the darker squares if it's darker pink let's take that one that's sort of badly modeled by the factor model in the sense that the error is large and positive and if it's darker blue take that one there that's badly modeled by the factor model in the sense that it's large and negative but I like to think that this is a fairly pale picture and in other words that it's not a bad factor model yeah so it's a simple play so every firm in the SMP is officially categorized in one main sector and then within that main sector into a sub sector so there're on there is no data on multiple sector membership here and clearly I suppose some companies are easier to attribute than other companies not being an asset management manager I'm not really an expert on this but every company is uniquely identified to one main sector and so you couldn't obviously imagine that certain companies are poorly explained by their sector factor and other other companies are very typical of their sector factor and this picture gives you some information we could do the we could do the 365 by 365 but it just starts to look like a some kind of abstract art the error should also I won't continue with this script but there are a few more things that are investigated the errors should also be uncorrelated with the factors and there's other things you can look at but I just wanted to give the key idea a fundamental factor model you attribute the stocks or the individual risks to certain classifying variables sector and industry I would have liked to sorry country and sector I would have liked to have done country here but they're all American effectively so we could have thrown in some worldwide stocks and done country industry factor models they're all there in the sp500 so they're all large in size so we can't do the large-cap mid-cap small cap it would be nice to have more than one factor going on here but I've done basically a just industry as a factor so you hopefully you see how that works we use that factor modeling philosophy tomorrow with interest rates and we will also use another factor modeling philosophy which is probably more familiar to people and that is the PCA philosophy so really quickly I will show the PCA philosophy this is a statistical factor model so we know neither the we don't know what the factors are we don't know what the weights on these factors are yeah yeah oh yeah so factors yeah so now so yeah the next part the story is the factors become now of primary interest and the whole task becomes to forecast the factors because they they most of the risk is in the factors the stocks are well explained by the factors they have their own idiosyncratic risks but if I am invested across the market those diversify away to some extent but I can't avoid the risk that's in these industry factors or more generally in the broader market yeah so then so so I reduced I mention from D to P from 365 to 9 and then we could fit nine dimensional distributions or indeed nine dimensional time series models could do a nine dimensional gosh and that's my ambition this afternoon depending on how things go to do something like that nine dimensional the PCA the theory of the PCA is if you're seeing it for the first time you need a bit of time to digest it if you've seen it before you probably don't need to look at it and I want to wrap this up in 10 minutes so really I wanted to do PCA without telling you too much about how it works we have to both find the factors and find the weight the loadings find the factors and find the loadings and what we do is we apply a decomposition to the sample covariance matrix a spectral decomposition the sample covariance matrix it works for any symmetric matrix any symmetric matrix admits a decomposition of this kind in terms of eigen eigen value matrix and an eigenvector matrix and we apply that to the sample covariance matrix basically so we applied to a covariance matrix we decompose into eigenvalues and eigenvectors and we make a transformation of the original data the transformation looks like this we take the matrix of eigenvectors and we transform the x data and that gives us Y that doesn't reduce the dimension because I just go from D dimensional vectors to new D dimensional vectors so I don't reduce the dimension but what I do get are vectors which are then orthogonal this is the principal components transform the the components are then uncorrelated and usually I the way that I form this matrix is in such a way that the the eigen values are ranked from largest to smallest and the eigen vectors are arranged accordingly and I take the first few principal components you know one or two or three I take the first few components of the Y vector as my factors so missing this out what I do is I think we just go to this how is it a factor model I make the principal component transform I find the eigen vectors of the covariance matrix I take the first few because I've arranged them according to egon values from largest to smallest and the idea is that the the the first few linear combinations are the ones which explain most of the variance of X so most of the variance of this vector X can be explained by the first few principal components so I basically use the first few components of Y as my factors so I will call y1 so y1 contains the first K components those will be my factors and y2 contains the remaining components those essentially will be the errors now this will never give a perfect factor model because there's nothing in here which says that these error vectors are uncorrelated but it's a very commonly used strategy and I'm just going to do it because you'll you'll learn something by I think by it seeing this carried out I won't do with exactly the same data so this is PCA factor models I wouldn't do with exactly the same data just for visit I'll actually just do it with the dow jones data that's goes that's a smaller dimensional world a 30 dimensional world so I need the same libraries this time I take I'm going to take the dow jones index so I go to dimension 30 effectively although I will throw out a couple that's the Dow Jones Industrial Average of the 30 stocks in this average certain of them are missing this is the picture of missing values so I'll throw out these two there are a few missing values with these ones I can't remember what the names are but I throw out two of 30 so there are now 28 and we can plot what we've got should have here here so this is now my data so it's only 30 this time so it's a little bit easier to see what's happening and this time the first one is Apple up here so remember remember that Apple is the first one I need the log returns that's the returns on the on the index itself but this here is the these here are the returns for the stocks and again I'll just use monthly data so i will use monthly data once more okay so that that's going to be the data that I apply a factor model to but my factor model it will be done by principal components and so this is the command here that does it determine the principal components in fact there's two functions that do the same thing I'll use the first one print comp and so I will do a principal component analysis and then we'll look and see what what that gives us okay so I've carried it out the summary well the numeric summary is not all that interesting but I'll show you two things now which are probably useful so being a 28 dimensional vector there are 28 stocks there are 28 principal components that can be looked at and these are the variances of the principal principal components ranked from largest to smallest so you have a situation here where this is the this bar represents the variance of the first principal component this is the variance of the second this is the variance of the third and I have to decide where I'm going to stop I'm going to reduce dimension from 28 to something a lot smaller maybe one maybe I just take the first principal component maybe two maybe three but I'm going to say that these principal components capture the most important features of the data but what are the yes the principle components and this is what we'll show you that have a look at this this is what they are component one so component one is a linear combination these are the these are the elements of the linear combination it's - note point two eight Apple and - no point two five American Express so that's a it's just a vector component two is plus 66% Apple and minus 16 well it's not a percentage but its plot is not 0.66 Apple - not point one six American Express this means the number is very small it doesn't mean it's really zero but the way that the loadings are plotted in the PCA analysis if you don't see anything it means it's smaller I think the not 0.1 so immediately your eye is drawn to the larger loadings where they're negative or positive now the first component all of the signs will be the same because it functions like an index it is simply like an index although the weights are not all equal you can see these weights are not all equal they differ but it functions like a kind of this this linear combination represents like a market index yeah we stock data yes we stock data it will look like that I guess it's cap em they're all to some extent following the broader market it's difficult to have a stock that I mean you have counter you have ones which are sort of counter cyclical which buck the trend but to some extent they all follow the broader market yep so yes so yeah of course this isn't the complete mark this isn't the complete market here these are old you know big titles this isn't the complete market and it's not its price weighted so if you just have a look down every sign is negative they're quite close certainly they're not all the quite close but so this is the linear combination that actually maximizes the variance among all possible linear combinations I missed out the I mean the the interpretations of these things I sort of missed out the first principal components the standardized linear combination of X which has maximal variance among all such combinations and then from then on the j this' the standardized combination which has maximal variance among all linear combinations which are orthogonal to the ones before so that that's what they are and yeah so these are the eigen vector these are simply the eigenvectors of the of the covariance matrix that's what they are they are the eigen vectors of the covariance matrix the so the let's say we use let's just use the first three to define factors I'll take three I don't know why I take three I could take two but in this script here I take three the factors are called the pca scores or the complete set of them the principal come estimated principal components are called the pca scores and i will take the first three and plot them so that that's if you like the first three principal prynt principal components important one component to component three that's how they behave stochastic volatility I'd say and some in some of these again it's based on monthly data these are orthogonal by design so if I now look at the covariance matrix of these three that's the identity these are zero effectively so that's the identity matrix they are what what are you saying Frank 2009 is here well let me do something let's take that first principal component interpreting the first principal component first principal component I will show you the first principal components that's the duck that's the the returns on the Dow Jones Industrial Average and that is the movements of the first principal component what do you see here it's a little bit like as if this were a mirror a little bit like they're flipped over so let's change the sign of that and plot that against that and so that's the this is the dow jones index returns plotted against the negative of the first principal component so you can see they are you know highly correlated but they're different so if you like the first principal component is an alternative an alternative index so returns on an alternative index weighted according to the mathematics of principal components so that's very brief but it's just another way of getting factors out of the data what is better fundamental factor models or principal components they both work quite well it appears and both have their advocates and companies have built their businesses around fundamental factor models in particular principle components always works remarkably well in extracting linear combinations linear linear if you like linear weighted portfolios which carry a lot of information if the first component acts like an index what is the second component if the first component is like an index what is the second component just have a look at the waiting's the loadings here okay I will look at the first one the second one okay Apple 0.66 three negative negative negative Cisco no point two one five small negative what else do we have IBM nor point one two eight Intel not 0.357 Johnson & Johnson negative coca-cola negative Microsoft positive it's like a contrast between anything to do with technology and everything else okay so the first one comes like out like a contrast between tech nor the second one is like a contrast between technology and everything else having said that if I if I use this as a factor model and I'll just cut to the end now basically and show you another of these pictures if I if I if I go to the end now so I take these as factors I then look at how good is the factor model I look at the errors and I make the same kind of picture of the correlations of the errors as if I use the first three principal components as the factors again a lot of white in this picture but bits of dark red and bits of dark blue the first row and the first column is Apple apples is the one where the colors are darkest I think so Apple is like even in this bottle Apple has dark blue and dark red squares and so is a is a kind of law into itself you might say as far as its returns are concerned but okay the important thing is not analyzing stock returns which I upset don't do professionally but different techniques for getting factors that's what your to take away any questions yep yeah so what so the principal components it's a very good point and people usually say well these are factors that you discover by mathematics or statistics and a priori they may not make any sense when you do it with stock returns they do make a kind of sense as I've said the first one is like a market portfolio and the second one is always some kind of contrast but then the third and the fourth and the fifth what what are they sort of high Reuter kinds of contrasts with the stocks you get some sort of interpretation but if I took all of my financial risk factors if I threw in my exchange rates and my volatilities and everything and just did a PCA something would come out but you know it might not make any kind of sense whereas the other two kinds of factor modeling some thought goes into either what should be the factors or yeah what should be the factors either you identify particular time series to act as factors or you classify your data into groups and manufacture the factors I doubt they look completely different this first two will still I've done this thing over and over again with with the stocks the first two always come out much the same way probably we should do it on the broader market as well and see what comes out but but yes it's not it's based on the covariance matrix which if you badly estimate a covariance matrix obviously you're going to badly estimate the principal components and with heavy-tailed data the problem of estimating a covariance matrix is quite a serious one as well so yeah there's there is a robustness issue in what comes out could do could do yeah I haven't analyzed this to destruction but in so much as it's all based on the covariance matrix and covariance matrices standard estimators few standard covariance matrix estimated was heavy tailed data they're not very robust then that must have some repercussions in what comes out we wanted to mix it up a bit this afternoon and we have a big topic coming up two on Thursday morning which is copulas and dependence and we thought that if we hit you with a ton of Cobblers independence and Thursday morning it might be overload so we wanted while you're fresh we wanted to ease you into it which is why we thought we just do a bit a little bit of factor models and now do a little bit of couplers and dependence which ridiger will do this before the coffee break and a bit after the coffee break and so for those who go completely lost with PCA and factor models you can reboot you're now gonna hear a little bit about cobblers independence

Info

Channel: QRM Tutorial

Views: 1,898

Rating: 5 out of 5

Keywords:

Id: s0ja-VB5-Bs

Channel Id: undefined

Length: 137min 5sec (8225 seconds)

Published: Sun Jan 21 2018