CT6 Introduction to generalised linear models (GLMs)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this unit we'll begin our study of generalized linear models or glimpse for short we'll start by recapping linear models from C t3 will then see how we can generalize these and the three components that define a generalized linear model finally we'll look at how we can estimate the parameters in the model using maximum likelihood estimation before we can talk about generalized linear models we first need to recap linear models from cd3 suppose we have data on the claim amounts received on a motor insurance policy and we also have the age of the policyholders if we plotted a scatter graph of these results it might look something like this so we may be able to model this by using a linear relationship with equation alpha plus b2 x i where X I is the age when we were defining a linear model in ct3 we made an assumption about the distribution of the Y's in this case the claim amounts we assumed that they were normally distributed with mean mu I and common variance Sigma squared the function of the explanatory variable the excise is alpha plus beta X I and we can see from the data points that only one of the Y's is actually on the line alpha plus beta X I so what is the connection between the Y is the claim amounts and the explanation variables the x i's well we assume that the mean claim amount lies on this line and so individual results will be distributed around that with our variance of Sigma squared what we're now going to do is generalize this model and we'll do that by generalizing the distribution that's the Y's the claims in this case can take we'll also generalize the function of the explanations and finally we'll look at how we link the expletive variables back to the mean of the distribution firstly we require the distribution of the Y's in linear models we saw that this was normally distributed however in a generalized linear model we will allow Y I to be any member of the exponential family secondly we need what's called the linear predictor this is the function of the covariance in the millennia model this was the X I and our function was alpha plus beta X I however in generalized linear models we no longer have one explanatory variable X I we have a whole variety of them which vary together and hence they're called covariance and so we could have for example alpha plus beta X i where X is the age of the individual plus gamma Z I where Zed I might be the number of years that they've held a driving license or we could do a linear predictor where we had the gender of the policyholder clearly we can't put male or female in there but we can assign a different number to males and females or perhaps we can have a quadratic function of the age of the individual now in case you're wondering why it's called a linear predictor when we've got a quadratic function here it's linear in our parameters alpha beta and gamma that we wish to estimate that is we're not trying to estimate something like b2 squared finally we need what's called the link function this is how we link our linear predictor our covariance back to the mean of our distribution say our claim amounts for linear models we saw that the mean was simply equal to our linear predictor ie it was equal to alpha plus beta x I for generalized linear models we can have a variety of linked functions for example we might have log of the mean in which case using the linear predictor above our mean would equal the exponential of alpha plus beta x I plus gamma X I squared why might we have different link functions well in the normal distribution you can take any value it likes where as in for example the Poisson distribution the mean can only be positive so by having a link function which is log of mu so that mu is the exponential of our linear predictor it doesn't matter what values of alpha beta and gamma we get our mu will still be positive so in summary we are modeling the Y's in general insurance that may be the claim amounts or the claim numbers in Life Assurance it could be the lifetime of the individual we will then have some covariance these are all the variables that we think will affect our claim of Apps claim numbers or lifetimes and then our link function links these two things together if we can estimate the unknown parameters for example the alpha beta and gamma x' we can then use that to calculate the mean of the distribution and thus predict mean claim amounts or claim numbers to estimate the parameters we'll use maximum likelihood estimation so let's quickly recap there Meli's first of all we obtain the likelihood if we were estimating the mean and we have claims data y1 to yn then our likelihood is defined to be the product of the PDFs we then log the likelihood so that it was easier to differentiate this gives us unsurprisingly the log likelihood then we differentiated this with respect to MU and set the derivative equal to 0 we then rearrange this and that obtained our estimate for MU we then check to see that we obtain the maximum by differentiating it at second time and if the second differential is negative we had a maximum now technically we're finding the MU which maximizes the log likelihood but since a log function is monotonically increasing if we maximize the log likelihood we will maximize the original likelihood well how do we adapt this for generalized linear models I'll step one we still obtain the likelihood however each individual is unlikely to have the same mean claim amount for example and so we'll have mu 1 to MU n instead of a single mu once again it's the product of the PDFs for example the product of the PDF of the claim amounts next we log it and this gives us our log likelihood however remember that the mu is depend on the alpha betas and gammas and so we actually need to estimate those first in order to obtain our mean claim amount and so what we do is we introduce an extra step we use the link function to replace them UI's and so we now have a log likelihood with our unknown parameters in our alphas beaters and so forth now we can differentiate this and set the derivatives equal to zero so we'll differentiate with respect to each of the parameters say D by D alpha D by D beta and then we'll solve these simultaneously to retain our estimates for alpha beta and gamma and so forth once we have those estimates we can then obtain what our mean claim amount mu is now because we're estimating more than one parameter checking that our estimates actually give maximum is rather a messy affair and so we don't carry this out in the exam now calculating maximum likelihood estimates of our parameters is a key question in the ct6 exam and so the next four units develop this key skill we would recommend that you study them closely and not simply move on to the next teaching unit here's a summary of what we've covered in this unit
Info
Channel: Actuarial Education
Views: 92,004
Rating: 4.9171743 out of 5
Keywords: CT6, statistics, statistical models, GLM, GLMs, generalised linear models, linear models, generalized linear models, regression, math, stats, actuaries, actuarial, insurance, GLiMs
Id: vpKpFMUMaVw
Channel Id: undefined
Length: 7min 36sec (456 seconds)
Published: Mon Aug 20 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.