Lecture 23: Beta distribution | Statistics 110

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Okay, so let's get started. Happy Halloween. Halloween is one of my favorite holidays. Symbolic of 110 in various ways. In particular, I think we learn a lot of tricks in this class. You get a lot of treats too. So there's the perfect Stat 110 holiday. Speaking of tricks, a student from last year emailed me and told me that there's an open request at the Bureau of Study Council for someone to quote, teach me all of Blitzstein's tricks. So that's probably gonna be a difficult to fulfill, so that's an open request with the BSC, but anyway, a lot of tricks and a lot of treats. So today's gonna be a very, very scary lecture as befits the holiday. In particular, the first half, we're gonna do the beta distribution. And to give you some indication of how scary, terrifying the beta distribution is I'll tell you a true story which is the first time I taught Stat 110, students were so just clearly just terrified of the beta distribution. I didn't really know why, actually it should not be scary, okay? So I'll try to who you how it's not that scary, but for some reason, they were extremely scared of the beta distribution. Then on the final, I decided to be nice. I didn't put anything involving the beta distribution, cuz they were so scared of it. But then that really backfired on me, because I think it was the students were seeing the beta distributions like a ghost. And they were just seeing betas lurking everywhere. And when I was grading them with the TAs, there were just beta distributions popping up everywhere on the exam. Even the questions that had nothing to do with the beta, they were so scared. Is there a beta lurking here behind the corner? And it was just beta everywhere. So I think it would have been nicer if I'd just put one easy beta question at the beginning, then you kind of get that fear out of your system. So anyway, let me tell you what the beta distribution. It's not scary but for some reason, it has been in the past but it shouldn't be, okay? Well, what is it? It's a generalization of the uniform distribution. So far, the only distribution we know by name that is both continuous and bounded is the uniform, right? The uniform, let's say uniform from zero to one. Well, by definition it goes from zero to one. And the PDF is just completely flat, okay? What if you want to generalize that and other continuous distributions like the normal goes from minus infinity to infinity? Exponential goes from zero to infinity. What if we want something that's still bounded between zero and one, but is not just flat? Well, by far, the most widely used distribution that meets those criteria is called the beta distribution. And it's not one distribution because it has parameters. So, it's a whole family of distributions, which we call beta a, b. So it has two parameters, a and b, where a and b are any positive real numbers, Okay? And, It takes values between 0 and 1, as I said, and let's write down the PDF. The PDF, Is proportional to x to the a- 1, 1- x to the b- 1, for x between 0 and 1, 0, otherwise. C is a constant, so this is the normalizing constant, okay? So one question we'll have to deal with is, can we actually integrate this thing, dx from 0 to 1, so we can figure out what c has to be to make it integrate to 1? And that's a famous integral in math. That's called, if you integrate this, that's called the beta function. And it has a long history in math aside from its application in probability in statistics. So we will talk about that integral at some point, but there's actually a lot that we can get from studying this without even know what the constant is yet. So, now here are just some facts and properties. We've been emphasizing stories, right? Why write this down unless we have a story for it? The beta distribution actually has many stories to it, and we're not gonna try to do all of them today, but we'll see more and more stories for the beta. But the most basic reason why betas are interesting is just that it's a flexible family of distributions, Of continuous distributions on 0, 1. And by flexible, I mean that as you vary a and b, you can make this take different shapes, right? So just for a couple quick examples, if we let a = b = 1, then this whole thing goes away. We just have a constant, so that's just the uniform distribution. So it is a generalization of the uniform. So the PDF would just look like that. But we could let a = 2 and b = 1, for example, they don't have to be integers either. It can be any positive real numbers, okay? We could go, a = 2, b = 1, this term is gone. Then we just have x, so we will just have something that increases linearly up to 1. This is 1 here, right? It would just increase linearly. And it's whatever makes the area of this area of the triangle has to be 1, so figure out the slope. More interesting ones, if a = one-half and b = on-half, you get something that looks like a U shape, kind of like this. Here's 1. And notice that if it's one-half, that's x to the negative one-half. So as x approaches 0, it's gonna blow up. So this is going up to infinity over here. And as x goes to 1, this term is gonna blow up. So it's asymptotic like that, so you can get something like that. And if you let a = b = 2, then you'll get something that is just a nice looking upside down U shape. For example, these are just a few examples. You can plug in other, but that's why I say that it's flexible. That there isn't one shape that it takes, and that makes it a useful modeling tool. So the main use of it is that it is often used as a probability for probabilities. That is it's often used as a prior, For a parameter, that takes values between 0 and 1. So this is going back to you had some homework stuff. And we talked a little bit in class about the Blitzstein approach where we treat the parameters as random variables. So if we have a parameter that's a probability, we know it's between 0 and 1, and we wanna to give it a prior distribution. The beta distribution is, by far, the most widely used choice in practice, because it has a lot of nice properties. In particular, it's what's called the conjugate prior to the binomial. If you take step 1, 11, you'll be seeing a lot of conjugate priors. Obviously, I'm not assuming you know the term conjugate prior yet. I'll explain what this means. So this is foreshadowing 1, 11 in a sense, but it's also just a 1, 10 calculation that you should be able to do Once you know what conjugate prior means, which I'll explain in a bit. Okay, and then it also has various connections with other distributions that we'll see later. So it's a very well connected distribution. It has lots of nice properties and relationships with other distributions and it makes it a useful tool. Okay, so, let me explain what this conjugate prior thing means cuz I just wrote that. But, obviously you don't know what that is and almost you happen to have seen that before. What we're gonna do is generalize Laplace's rule of succession. So remember that was the problem with, what's the probability that the sun rises, kind of thing. Where we assume complete ignorance on the probability of the sun rising. Well that's what Laplace did, obviously it doesn't have to be about the sun rising, we're actually not that ignorant. You have some problem where we're completely ignorant and we modeled that probability using a uniformed prior, okay. So right now we want to generalize that and say what happens if instead of a uniform prior, we have a beta, A-B prior? And we don't even know what the normalizing constant is yet. But we don't actually need that for this calculation, as you'll see. Okay, so here's the conjugate prior thing. And I'll explain the word conjugate as this develops, for the binomial. So here's the basic problem which will look familiar, but we're generalizing, okay. So the problem is we observe a binomial given p. X given p is binomial Okay, that's our old friend the binomial. If we're treating p as this notation means if we pretend that p is known, it's just our friend the binomial. But p itself Is unknown and we're giving it a distribution to reflect the uncertainty of p. So in practice, the most widely used choice is some beta distribution. So we're gonna let p be Beta (ab). Okay, so that's called the prior. That is our reflection of our uncertainty about p. That is we don't know the true probability, so we're just treating it as a random variable. X is the data, X is what we get to observe. Before we observe X, we have a prior on p, that's our prior uncertainty. So this is not based on the data, okay. Then after we observe x then we wanna update our probabilities, right, using Bayes' rule. So that's why this is called Bayesian statistics, cuz we're updating our believes based on getting the evidence. So we want to find the posterior distribution. That is the distribution p given X. Now we get to know X, and what's our update? Originally we have a Beta (a,b), and then what happens after we're allowed to use X? So, how do we do this? Well Bayes' rule, so this is one of these hybrid forms of Bayes' rule, right. So ust to make the notation clearer, I'm going to write this as F(p|X=k). You can also write this with a capital just the capital X under the interpretation that it's just short hand for something like this. That is, we get to treat X as known. Okay, so this means you know, this is a posterior PDF. Right, p is continuous, so it has a PDF. So this is what the PDF of p given that we got to know now that X is in fact equal to k. Alright, so that is what that means. So by Bayes' rule, that's the p(X=k|p) times f(p), that's the prior, divided by the probability P(x=k). Where in the denominator, you know, key distinction between the numerator and denominator. Obviously you can't cancel these out, this one is a function of P. This one is treating P as unknown. This one is saying, integrate over all P. So this one does not depend on P because this is just going to be a constant with respect to P. It will depend on k but it won't depend on P because we've integrated out the P. So this one does not depend on P. All right, now let's just see what we have here. Probability of x equals k given P. That's easy, that's just n choose k, p to the k, 1- p to the n- k. Now f of p, that's just the beta density. So that's a constant whatever that constant c is over there. We don't know what c is, but it wont matter. And p to the a- 1(1- p) b-1 divided by the probability that X = k. Now, seemingly to do and probably X = k we would have to use the law of total probability. That was, we integrate and condition on P. And we can do that, but there's a much faster way to do this, which is just to look at the structure of what we have, up to proportionality. Okay, so I'm just gonna write the proportionality symbol, that is I'm gonna ignore constant. And by constant I mean something that does not depend on p, okay. So let's ignore constant. So actually we can ignore the n choose k, cuz there's no p there, we can ignore c, cuz there's no p there, and we can ignore this thing, because there's no p there. So, all that's left is this p to the power, 1 minus p to a power. So, we have p to the a + k- 1 (1- p), just adding the exponents. (1-p)b p + n- k- 1, and that immediately tells us the answer. So we didn't have to do any complicated calculations at all just group the ps together, 1-ps together ignore the constants. And then that immediately tells us that p given x, that's a beta distribution, right. It looks like a beta, there's some constant in front but the constant is just whatever constant is needed to make this integrate to one, right. So its still of the form of a beta distribution. So this is going to be a beta with parameters. K got added, but if I write it this way, I'm just going to write Beta (a + X, b + n- X). This is what the notion of congregant prior means. The definition of congregant, it's not just one prior, it's a family of priors, and that family of priors is called conjugate. If we start with a member of the family, so we're looking at not just one beta distribution, it's the family of all beta distributions where you can change the parameters around. We start with a beta distribution, compute for the prior, compute the posterior, it's still beta where these are updated parameters. That's called the conjugate property, so this is the conjugate prior for the binomial. And this has a pretty intuitive interpretation, that if we think of X as the number of successes and- X as the number of failures. The way you can remember this result and understand it intuitively, is pretend that we had some prior data that we might have done earlier experiment were we had a successes and b failures and then we're adding on now we have X. New successes, and n- x new failures, okay? So it stays in the family. That's computationally very, very, nice. But that doesn't mean it's true in any sense and there are philosophical debates about should you use conjugate priors and things like that. Is that the correct way to correct your uncertainty, things like that. But there's no denying that it's convenient cuz you start with a beta you still have a beta. You don't need to work out a whole new theory of some new distribution. So it's very convenient in this sense, very, very flexible. So this is a generalization of the Laplace rule of succession. That was the case where we let this be beta 1 1, which would be uniform, okay, but the same thing works. So next time we'll derive the mean and the variance of the beta, but you can already see that that's going to be kind of nice too. Because, just stare at this density, I won't write it out now, we'll do this next time. But just to have some intuition for what's going to happen, look at this density. Suppose we wanted to compute the expected value of x. Then we're just going to integrate this thing with an x in front, right? Just definition of expected value. So we could combine this x, this x a- 1, we'd have x to the a. It looks like a beta again. So once we know the normalizing constant, we can immediately write down the answer. What if you wanted the second moment, which you would use to get the variance? Then by LOTUS you would be putting an x squared in front to get the second moment. X squared x to the a -1 is x to the a + 1. It still looks like a beta. So once we know the normalizing constant, we can immediately write down the answer. So it has very, very nice properties in that sense. All right, so let's not, we're not gonna do the full normalizing constant today. But I wanted to do one important special case which is the case of how do you get the normalizing constant in the integer case, okay? So a and b don't have to be real numbers, but let's assume that we have integers up there. So another way to phrase that question is, here's what we have to do, we have to integrate. From 0 to 1, Of, let's say, I'll write it in a way that looks a little more reminiscent of a binomial, but it's the same basic thing. Xx to the n- k dx. I wanna do that integral. Now we have integers, nonnegative integers up here, and k is an integer between 0 and n. We wanna do this. Well, okay, so if I first gave you this integral you might say, well use the binomial theorem, expand this out. Multiply this whole thing out, integrate every term, and then try to simplify and it will be very tedious, okay? So let me phrase this question a different way. Find the integral of x to the k, (1- x) to the n- k, without using calculus. So my favorite calculus problems are the ones where you don't have to use calculus. And so, that sounds impossible. That's actually one of Bayes' most brilliant ideas. Bayes' rule, itself, as we saw, you just write down a definition of conditional probability it's just like one line you got Bayes' Rule. But Bayes also had an argument around 1760 something, he did this integral without using calculus. So I wanted to show you his argument cuz it's very beautiful but it also gives us the normalizing constant for these integer cases, okay? So what does it mean to do it without using calculus? Well, we're going to use a story. And that story is call Bayes Billiards. Okay so this becomes a little bit nicer if I put an n to k here. I'm just multiplying by a constant. So we could always adjust for that later, but this looks a little bit nicer. All right, so here is what Bayes did. He said, consider that we have n + 1 billiard balls, that's like if you're playing pool. This is exactly Bayes' argument in 1760s. He said take n + 1 billiard balls and then you can do one of two things. Originally they're all white and they all look the same and then we paint one of them Pink. So now at this point we have n white balls and one pink ball, okay? That's something we could do, and then throw down the balls, throw them on (0,1) independently. That is, we're just positioning them, we have the number line from 0 to 1. Here's 0, here's 1. And just throwing all the balls down, you get some arrangement here, some white balls here. And maybe the pink ball has to land somewhere, maybe it's there this is just an example. So we have iid uniforms. Okay so that's the first story, Here's an alternative. So here we painted the ball first. We painted one ball pink, threw them all down. But another thing we could do is throw all the balls down at the number line, and then paint a random one pink. So first throw them paint. Then paint one pink. Okay now if I just showed you this picture. And didn't tell you where that picture came from you would have no idea whether it came from this story or this story, right? It's completely equivalent procedures, why should it matter whether you painted it at the beginning or the end, it doesn't matter. The final configuration is gonna have the same distribution, all right. So the key to the argument is that these two ways of generating a picture that looks like this are completely equivalent. That's tells us the value if that integral because now let's just let x equal the number of balls to the left of the pink one. Which I denote like that, okay, that's x. So in this picture, x equals three, for example, just count how many are to the left. That's going to be some integer between zero and n, all right? Now, according to this first story it doesn't matter what order we throw the balls in. So we may as well throw the pink one first, then condition on where that is. And so we want an expression for the probability that x = k. That is, there are exactly k white balls to the left of the pink ball. So by the law of total probability, we should just integrate. Probability that x equals k, given p f of p dp where this is just representing where p for pink and p for probability, where is the pink ball? And so that's something we can write down easily because f of p is just 1, because we're assuming it's uniform between 0 and 1. Probably x equals k given p. That's just a binomial because we define success for the light balls; success is defined as being to the left of the pink ball, failure is being to the right. So we just want to count the number of successes. That's simply a binomial and choose P to the k 1minus p to the n minus k dp. Notice that's the exact same integral we had. We changed the dummy variable, x to p. On the other hand, we're running out of space here. But, on the other hand we can just write down the answer, because from this second story, this is story one, this is story two. Story two I first threw the balls down, paint a random one pink equally likely. So it's equally likely that there's any valid number of white balls still left the pink one anywhere. So all numbers from zero to n integers are equally likely, so it must be one over n plus one. So that integral is just one over N plus one. And you can write that down just by looking at the picture without actually doing integration by parts. Or expanding things out term by term all that stuff. So that argument gives us the normalizing constant for some cases of the beta. Next time we'll do the general normalizing constants see some of the connections to other distributions. We have one other very scary thing for today which is finance. We have a special guest today Steven [INAUDIBLE] who teaches STAT 123, just to briefly introduce him. 123 has gotten better reviews from people who've talked to me about than almost any other course I can think of. A lot of students who told me that when they went and did finance internships, they felt way better prepared than their fellow interns because of having had 110 and 123. As kind of a unique, just to tell you very briefly a little bit about Steven's background. He got his PhD from the stat department here, so that's a proud claim to fame for our department. He then taught at Imperial College, London. He has many years of experience working on Wall Street. And now he's Managing Director at the Harvard Management Company, which is responsible for managing a large portion of the Harvard Endowment. I think it's definitely true that at Harvard, you will not find anyone else with the same combinations both of practical experiences as well as academic training. Just combining actually been on Wall Street for years as well as having a PhD in statistics. You won't find that in any other Harvard course. I don't actually think you'll find that in any other course in the entire country. So we're lucky to have that course, STAT 123. Which is not offered this spring, but it will be offered next year. And so, Steven didn't wanna miss out on your group, so, take it away. Let's all welcome him, and let me give you the microphone. >> [APPLAUSE] >> All right, well, thank you to Joe. And thank you all for giving me 20, 25 minutes of your time. It's always a pleasure to talk to Stat 110. I'm here for a couple of reasons really, firstly STAT 123 that Joe mentioned is a reasonably new course here at Harvard. I'll giving it for the third time in 2012, 13. So wanted to give you a little bit of flavor of what it's about. But secondly, STAT 123 has one prerequisite, which is STAT 110. So you all here are my eligible receivers so to speak, or eligible to take the course, and I wanted to give you a little bit of its flavor. I apologize for the seniors in the class, if there are any that won't be able to take the course, but I hope the next 20 minutes might even for them be intellectually appealing. There are gonna be a couple of brain teasers that might make you think a little bit about probability and its relationship with the financial world. Probability is, to borrow a phrase, the soul of finance. It's the theory that underpins most of quantitative finance and hopefully I will show a little bit about that today. So for those of you who enjoy this course and want to see what I think is an intellectually appealing use of probability. Which is also very real world oriented. That is what STAT123 is about. Just one further word on prerequisites. STAT 110 is the only prerequisite. You don't need to know anything about finance to take STAT 123. The whole of [INAUDIBLE] finance was essentially built by people who knew nothing about finance, and still don't really. It was built by a pro-bliss mathematicians and statisticians and physicists. A lot of physicists who moved to finance when the original super collider got cancelled in 1993 and they all went to Wall Street and built quantitative finance. So you don't really need to know anything about finance. Obviously if you're interested in a financial career it may be appealing, but many students flourish in STAT 123 without any interest in a financial career. So I've had people who wanted to go to Julliard or play professional soccer, but are just interested in some ways a new way of thinking, a new way of reasoning about uncertainty which uses probability to a great degree. If finance, you have a visceral dislike to finance, that's totally fine. But probably is not worth taking STAT 123 if that's the case. I do get a couple of comments about STAT 123, finance is all about making money. I don't want that to be, of course. That's what finance is about [LAUGH] it's about making money and the financial world. So if that's got a visceral dislike I wish I, totally empathize with, perhaps not taken. So let me tell you a little bit about the course. Let's write this down. Applied Quantitative Finance on Wall Street. I'm thinking about dropping Wall Street for 2012, 2013. A natural subtitle actually to this course. So there are many pejorative words in this title, but a natural subtitle is applied Financial Derivatives. This course is very much about derivatives. So two natural questions might what is a financial derivative and why is this a probability course. Why is this a stat course? So those are two natural questions Hopefully you can answer them together. So a financial derivative has nothing to do with calculus. Or at least its definition has nothing to do with calculus. There's some calculus involved in thinking about and valuing derivatives. A financial derivative, the term derivative means derives from. So a financial derivative is a contract so it's a bet, an agreement, between two people whose payout at some maturity date is a function of, or derives from, the value of some other Random variable. So for instance, if I enter into a contract with Professor Blitzstein where he pays me $1 if the total snowfall in Boston is above 80 inches this winter, okay? That is a weather derivative, right? The $1 is a function of some other random variable, in this case the number of inches of snow in Boston. That's simply what a derivative is. In financial, usually means the underlying random variable that the payoff is a function of, is a financial asset. It's the price of a financial asset. And typically written s. So in finance, random variables are s rather than x. Okay, cuz s for stock. So if you think s sub t is just the price of, say, Google stock in one year, that is a random variable, okay? We don't know what it is, it'll take potential values in a year's time, okay? And a financial derivative is simply something that pays off a function of that. Okay? For instance, g of s might be the indicator function above 500. So Google is above $500 in a year's time, you pay $1, or you get $1 from your contract, okay? You might, well, who cares about contracts like this? And we'll see particular forms of g, so a little later. Well, g of s, right? A function of a random variable is another random variable, okay? So we're gonna get caught up in, or use random variables, and it turns out, and this is one of the key results in finance, it's called the fundamental theorem of finance, because it's so fundamental, is the price that you would pay for a derivative contract. So even just think about this whether contract. How much would you pay for that? Well, it'd be related to the probability of getting more than 80 inches of snow, right? Can be related to the distribution as a random variable. And it turns out, what you'd actually pay for the contract is very, very closely related to the expected value of g of s, okay? All right, in fact, there's a very powerful result that says if I choose the random variable correctly and choose the distribution correctly, it is exactly an expected value. So expectation is gonna be run throughout Stat 123, but you know how to do this if you know the PDF of S, this is just what I like to call POTUS, I mean LOTUS, right? I always get those confused, but it's just the integral of g of s f of SDS, okay? All right, so LOTUS is gonna play a major part in quantitative finance, okay? Okay, That's not all there is to finance, but that's pretty much. When I arrived on Wall Street after, I actually TF'd STAT110 here many years ago. I loved it. It was about 25 people in the class then, so it was easier to TF. Was a great class. I arrived on Wall Street. And the first day I was on Wall Street, I was asked to calculate the expected value of a function. It was a bivariant normal. It was a function of bivariant normal, and they thought it was an incredibly hard problem. It's not trivial, but I thought, I can do that, because I knew how to calculate expected values. I did that my first morning. And they thought, wow, this guy's really good, he knows all about finance. But all I knew about it was bivariant LOTUS. And that, that stood me in good stead for the rest of my career. So, all right, here are a couple of, I won't call these brain teasers, cuz these actually, these examples are also interwoven with Stat 123. So I hope if you take them, this lecture will echo back and you'll think, yeah, I remember that teaser. If you have any questions about the course, before I get on to these, please feel free to e-mail me and belie that stat. And two of your TFs in this class, Bo and Jessie, have either taken the class or TF-ed it or both. So feel free to ask them about it. Obviously, I've chosen them cuz they enjoyed it a lot. Okay, so here is a couple of, here are a couple of brain teasers. First one comes from foreign exchange. So foreign exchange is a fairly straightforward concept. So how much is a Euro worth in dollars, right? So one Euro trades in Europe is worth about $1.40 now. That moves up up and down its sort of a spastic process. And one pound which is close to my heart that's worth about $1.60. These are things that people care about. Obviously, if you're doing traveling or if you're doing trade. And one thing you might care about is where are these things gonna be in a year's time, all right? How can I predict where, what the value of, say, a Euro would be in a year's time? Okay, it's obviously of interest in the economy. You might think if you could actually do that prediction, if that prediction was actually possible, then the person who could do that would be phenomenally rich and would be infinitely wealthy and would have retired, right? So you might want to think it is probably impossible to predict? But anyway, so people try to, right? And you can do it in lots of different ways. Build time series models, you can think about economies. But one thing you can do is just build a little probabilistic model, and that's what we're gonna do here. Very straightforward one, which takes two states of the world. It's called a binomial model, with two states of the world. And so let's start off with, so this is Foreign Exchange, known as FX in the business foreign exchange. We're gonna take a very simple world. We're gonna say where, we're gonna think about the value of 1 Euro, okay? Okay, let's assume that right now it's worth $1, so the price of 1 Euro is $1, just to make the math easy. So currently, there's $1. Let's write that down here. Per euro. Okay, the actual number is about 140 for those here. And I'm gonna say where do you think we think that is? Where do we expect this to be in a year's time? Okay, and we're gonna build a very simple probabilistic model. We're gonna say there are two possible outcomes. The sample space in a year's time has two states. One where the euro becomes worth more. Okay, so let's just say it's gone, it's not worth $1.25, so Euro's worth more. And one where it's worth less. Let's say $0.80. Okay, that's my, that's a random variable. It's got two states of the world. I'm actually gonna assign probabilities to it. Let's make it easy, just say half and half. All right, okay, there's a pretty straightforward model, don't you think? Might go up, might go down. Okay, all right, now we've gotta ask the question where do we expect it to be in a years time? Well, we can do expected value, expectation very easily. So the expected value of a Euro in a year's time, which I'll just write expected value of Euro. Right, so half times $1.25 plus a half times $0.80 cents, okay? So that's easy. That's $1.025. So this is where I expect the Euro to be in a year's time. It will have appreciated on average to a $1.025, all right? Okay, so that's pretty straightforward expectation. Okay, so that's what we expect for the Euro, okay? So, well, know what $1.00 must be worth, right? So $1.00 must be worth 1 over that, right? Which if you do, that is 0.9756. Right? Okay, so that's straightforward also, right? If a pound is worth $2, $1 is worth 50p, pence, okay, that's true. Right, so if a euro is worth a $1.025, $1 is worth 0.975 euro cents. They're called euro cents also. All right, so that's a foreign exchange prediction, that seems pretty straightforward, so we're done. But before we move on, let me just restate this model, all right? Well, pink chalk would be good at this stage, I think. I haven't had pink chalk before, let's see if we can use this pink chalk. Let's just restate the model. Well, $1 per euro, that's the same as one euro per dollar. So let's just restate this model in terms of the value of a dollar at each state. So this is 1 euro per dollar, okay? Up here, well it's $1.25 per euro, well that's 80 euro cents per dollar. Okay, and then this one is 1 euro and 25 euro cents per dollar. Okay, same model, right? Haven't done anything tricky. All right, now what do we expect? Let's just calculate the expected value of a dollar onto this model. Well, the states are still a half, so just calculate that, that's easy, and. Hang on a sec, I've predicted the dollar to be worth more now. $1 is now expected to be 1 euro 2 and a half. And so, you're just walking into your boss's office and say, I've got this good model that predicts that the euro is gonna depreciate to $1.025. And, but, hang on a sec, the model also says, the dollar will appreciate to 1 euro, 2 and a half. So if you did that, of course, the dollar will appreciate and the euro will appreciate. You'd guess the response is you're fired. >> [LAUGH] >> Now, how do we solve this? All right, that's the first one. Simplest probabilistic model you could ever have, right, a binomial state. And we've already come across our first brain teaser. And what's actually probably more intellectually, troubling exactly, I found intellectually compelling than that, it's very simple, is that the solving this, actually understanding what's going on here is actually quite non-trivial. And there we attack it two or three ways in STAT123. So, that's the first one. Okay, so second one is TARP. So this is, well, I said this course is applicable. Hopefully, this just gives some flavor of how this course is applicable about what really goes on in the financial world and whether you're interested in it or not, it certainly is an important part of the US and the world. So TARP, back in 2008, for the seniors of you who arrived here in September 2008, October 2008, you know that was not a happy time for the financial system or banks, or the Harvard endowment for that matter. So obviously, a lot of things were going on in 2008. And there's a great Econ course called The Financial Crisis, which I recommend to those of you interested in that. So what I'm gonna do is I'm gonna focus in on one particular part of the financial crisis that was going on in October 2008. Now this was called TARP. TARP initially stood for Troubled Asset Release Program. In this midst of all this financial crisis, there was a transaction where the US government, i.e., taxpayers, receive from these private institution warrants, okay? Okay, now warrants has nothing to do with being arrested or the legal system. A warrant is a type of financial derivative, okay, called a call option. So this is their equivalent to what's called a call option, which. Okay, and what a call option was or is, is the following. So let me just write down what the government actually had. So the US government. Paid. I've rounded the numbers, but $450 million. To Goldman Sachs, GS. And in return, okay, US government had the right, the right is not the obligation, it's the choice. It's the option, to buy Goldman Sachs shares. $425. In 10 years time. Okay, simple as that. Okay, they paid and they had the right to buy, it was 10 million. So they had the right to buy it for $125 each, okay? At the time, so this was in October 2008. Goldman Sachs' shares were trading. Were $95. Okay, all right, so you might ask, well, what was the government doing paying all that money to buy Goldman Sachs shares for $125 when they could have just gone out and bought them for $95, right? Okay, well, let's just think what the government has in 10 years time, so. Okay, if Goldman Sachs' shares, in ten years time, are at $150. What's the value of one of these warrants, one of these options to the US? $25, right? Okay, cuz I can buy something that's worth $150 by paying $125 for it, so. Okay, if Goldman Sachs is trading at $100. What's the option worth? 0, right? Okay, so we can begin to see this function, this g function taking shape for this warrant, which is same as a call option, just terminology. All right, this g function. Is max of the share price at maturity T. Here, T is ten years. And what we'll call K, K is, in this case, $125. And K is called the strike price, okay? And 0, all right? So here is a particular form of g, okay? The payout looks like this. So this is g(s). Okay, so if I know the distribution, the PDF of S in 10 years time, I know the price of this warrant, all right? So the warrant, the price, my board management not very good, is. Just a LOTUS, okay? So two things to point out, one is the government decided that the cost of one of these options, if you work it out, is $45 per option. Where did that come from? Is the US government in the business of calculating probability distribution, probability density functions? Right, so that's one question which we'll talk about in 123. The other thing just to note is the solution to this. It's just a LOTUS, right? If you think about it, it's actually not that tricky cuz you can just replace the. Limits of integration, get rid of your max, right? Okay, such a fairly straightforward integral. The solution of that integral under particular form for the probability density function is a very famous formula, the Black-Scholes formula, okay? So I like to think that you can get a Nobel Prize in economics just for knowing LOTUS. Because that's what the Black-Scholes formula is, okay? So thank you for your time, appreciate you coming here. [APPLAUSE] And I hope to see you, I hope to see some of you next year. Thank you.

Info

Channel: Harvard University

Views: 72,435

Rating: 4.9133334 out of 5

Keywords: harvard, statistics, stat, math, probability, beta distribution, bayes billiards, finance

Id: UZjlBQbV1KU

Channel Id: undefined

Length: 49min 48sec (2988 seconds)

Published: Mon Apr 29 2013