Introduction to Pairs Trading

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everybody welcome back we're going to be doing another lecture today today we're going to be talking about pairs trading pairs trading is one of the most popular lectures it's actually the first lecture because we developed it initially for a talk at Harvard with their applied CS department and since then it's it's gotten a lot of traction we use it in a lot of our workshops which you can find by going to quanto being comm slash events and it's an interesting it's an interesting topic so we're going to go as usual two lectures under learn and support I'm going to go to lectures and obviously you know this navigation may change at some point but this is how you currently get to it the easiest way to get to it in general is to go to quanto be on comm slash lectures we're going to go down to pairs trading introduction to pairs trading right here now the important thing is that pairs trading actually relies on a mathematical concept known as Co integration you can see are down at lecture 26 we actually have a full lecture on kind of not only cointegration but the concepts that build up to Co integration and probably most crucially stationerity so I recommend that you check out that lecture as well if you're so inclined once you've really kind of gotten Paris trading but for today we're not going to worry too much about the mathematical specifics of cointegration we're just going to make sure that you get the intuition that's the point of the lecture today so we're going to look at the notebook and as always you know it's in the Jupiter environment the first lecture introduction to research is a good one to go to if you don't understand how to work in a Jupiter environment it will teach you all the tips and tricks I recommend going to lecture 1 in the lecture series to check that out if you don't quite feel comfortable yet operating in this kind of environment but let's go over pairs trading so what is a pair trading strategy well the first thing we're going to do is we're going to start by generating two fake securities ok so you can see here we're simulating returns we're sampling from them from a normal distribution now in other lectures I've talked about the fact that actually normal distributions really do not well represent what happens in finance and so this is actually potentially a way that you could have some bias in your analysis you could be sampling your data from a normal distribution that's not going to reflect the real world at all you're going to have a lot more risk and fat tails in real life than you would from a normal distribution so you know just in general be careful this is just something we do for a earning example you know we're not trying to like say hey this is how the world works exactly we're trying to demonstrate a concept here just keep that in mind so we're sampling returns from a normal distribution mean of zero standard deviation of one sample 100 points and then these are just going to be treated as like additive returns all we're trying to do is get to something that looks like this looks like a random a random walk effectively and that is you know that is what we get by by cumulatively summing our returns here so we plot that out you see the first time it looks the same because we have this random seed set but we'll plot it again we get a different one get a different one you know you could easily control this you could sample many more points does something that's much more dense and you could change the mean so that it was always going going up on average you know so we're just going to look at our default example here meena zero state of the ocean of one I'm doing this because again what I want you to do is you're going along is really kind of try to mess around with the text and the cells and these notebooks and really just kind of get a sense for how this stuff works in a very hands-on way I think the best way to learn this stuff is really get your hands dirty and so that's why I want it what kind of prompt you may be if you're following along in your own notebook to try to make some of these changes and get more comfortable so the next thing we're going to do is we're going to generate some noise again sample from a random distribution normal distribution and then we're going to add 2x 5 plus the noise so kind of mathematically the way you can think about this is it's now our new series Y is X shifted up by five but with some random fluctuations thrown in you know so it's kind of like well if you take Y minus X then that should be 5 plus or minus some noise and that's a kind of a key concept we'll get to in a bit you can see here that's what this looks like and we could also resample this and you can see it's not going to change a whole time because what's really changing is the random noise between the two there they're mostly going to follow each other pretty closely because of how we set up this experiment and again you know you don't have to use random noise we could you know use something that looks more like exponential noise I think this is the way to do it my bad apparently it's not so we'll use the question mark notation to see what we want here so it takes exponential and it takes the first parameter as the scale of the distribution and the next parameter is the size or the number of samples we want to draw the size of the sample so that's what this looks like so you can see here this is what it would look like if we were adding exponential noise which is a bit more fat-tailed and you can see if we've increased the scale you can see our spikes are going to get bigger between the two so it's just interesting you know you did to consider the different ways you can model this maybe some of them are going to be more reflective of real life and some of them aren't in general be suspicious of anything that's normally distributed because you know real world data is is is in finance rarely normally distributed so we have two series now what we're modeling here while we're modeling two series that follow each other they have some deep economic link that's the idea we're modeling two assets these assets have a deep economic link and remember that these could be any two assets they don't have to be stocks because anything is an asset that has a value at every point in time and can be bought and sold your house is an asset your car is an asset you know these assets are more or less liquid it's harder to buy and sell a house than it is to buy and sell a stock but it's still an asset and so the interesting thing about pairs trading is what we're going to be talking about now is is a mathematical concept that's kind of agnostic to the type of stuff that you're trying to trade and what this gets at is is really just a kind of a basic concept that you could apply to two different stocks you could apply to a stock and another type of asset like a future contract or whatever your whatever you're looking at so the idea is cointegration cointegration is when the difference between two series and again this is loose I would recommend go check out the other lecture if you want a more precise definition but when the difference between two series is is mean reverting that's the general idea so we can see here let's plot out the difference between our two series we have to regenerate this plot because it's we regenerated our series before so you can see here that's what that's what the difference between the two is and you can see that little spike corresponds to here well what do we see here we see a mean of around five which means that there we go our series is 1 as X plus 5 you know plus the noise equals y so we should expect to see a mean difference of around five and this is the noise around five this is the some noise that we generated so will be plotted out to some noise it would be precisely this because that's all it is this is just the noise between the two plus that constant bias of five so because this is just noise this is going to be mean reverting because it's sampled from a normal distribution that has a mean you know it's if you're higher it's more likely that you're going to be lower at some point in the future that's the definition of mean reverting that you're the fact that you're high now doesn't mean you're going to continue to keep going higher instead that you're going to revert back to some mean so any series that kind of hovers around a mean and doesn't really diverge is known as mean reverting because it's always going to revert back to that mean and that's exactly what we see here and again cointegration technically is when for some linear combination of two things the result is stationary or sort of linear combination of N Things the result is stationary but we're not going to worry about that precise mathematical definition this one works just as well for our purposes right now and for intuition this is fine as long as you get the intuition here that the the difference between the two always kind of hovers around some mean that's okay that's cointegration and you can see we actually have a cointegration test that we imported earlier here and it spits out a p-value and whenever I'm doing this lecture in person at workshops and whatnot I always like to ask how significant is this p-value and someone always nearly always says very significant you know and what without what they're saying is that this tells us that you know these these these two series are very likely cointegrated there's a few problems with this worst of all you haven't checked which way the p-value goes because for any statistical p-value test you don't know what the null hypothesis is you don't know if this test is testing the hypothesis that they're cointegrated or this testing the hypothesis that they're not cointegrated so as an example we can do is well X and some noise shouldn't be related so we can actually test on X and some noise and there should be no Co integration between X and some noise because they're just not related at all we'll run that and we can see we get a p-value closer to one so that means for us what does it mean it means that the test if the two things are cointegrated will have a lower p-value that's what the test is telling us it means that if it's testing the hypothesis that the two things are cointegrated so now we know which direction the p-value is going in this test and we can say okay well it's a lower p-value in fact it's 10 to the negative sixteenth so does that mean that this is a very significant result well the answer is no because it's actually really bad it results in a lot of bias to think about p-values as more or less significant because implicitly then what you start doing is you start kind of comparing p-values implicitly and what you always end up doing is you'll say oh well 10 to the negative 16th that's significant at a 10 to the negative 15th level which is super good that's you know 56 15 Sigma they talk about Six Sigma and physics but that's super cop it's a super high confidence level that's completely wrong because what it is is you haven't really you didn't choose your confidence level in advance you need to choose your confidence level in advance and see if you get a p-value below that confidence level here we didn't choose one and so we obviously can just slide the confidence level down as far as we want to kind of you know like in correctly fit it to whatever p-value we got that's a really bad practice what you want to do is you want to pick a confidence level in advance and then do your test and if your p-value is below your confidence level you say yes I reject the null hypothesis the null hypothesis here is that they are not cointegrated and I'm good to go or I you know I I do not reject the null hypothesis there's not enough evidence to show that there cointegrated if the p-value were above your threshold it's a binary thing p-values are binary you have to treat them as binary and you can't treat them as more or less significant so looking at this thing by itself can be a little misleading what well actually we can actually do is we'll actually just add a little if statement here if p-value is and I like to do this in my own analysis and we'll choose a cut-off of 0.05 5% and that means that we'll be wrong 5% of the time when we're using this cutoff and if you don't understand this works I recommend reading so textbooks and statistics we're going to probably try to introduce more stuff into the lecture series early on if you check out the lecture series there are a few lectures earlier on to deal with confidence levels and p-values and all this kind of stuff so I I recommend checking out that stuff and also maybe reading there's a really good book called the cartoon guide to cartoon guide to statistics that I recommend it's a pretty good book but you can see here if the p-value is less than 0.05 print a likely cointegrated and else print likely not cointegrated and the thing is we actually haven't run this test in our new X&Y yet we just ran in the X&Y that we had previously generated in this notebook you know without regenerating it so we're actually going to run it in these two x and y and we're not going to look at the p-value we're just going to look at whether or not it's below or above the 0.05 threshold likely go integrated okay so this peva this this is just confirming that the simulation we did is actually linked satisfying the conditions of cointegration the idea here is again you've got two things the difference between them you know moves up and down but the difference is always going to hang order I mean it's going to revert to that mean if they get especially far apart they'll close up if they get especially close they'll open up that's the important thing and again it's coming from this hypothesis that they have an economic link which causes them to kind of stay in sync if one of them goes up the other goes up if one of them goes down the other goes down there's not a big there's not a reason that they're going to diverge that's the idea you're looking for assets that have an economic link to each other cointegration is not correlation this is a really important thing a lot of people confuse the two so our X correlate with Y is near one that's a very high correlation but here's an exam and and they'll often hang out together because if two things are linked they'll move together they'll be correlated but correlation is checking of movement and one explains movement in the other and in fact it's not adequate to say that correlation implies cointegration here's an example of when it doesn't you can see two series just diverging high correlation coefficient but a p-value above our 0.05 cut off so not cointegrated you can see why they're not going to grade they just diverge they're not going to come back together maybe on a very long time frame these two things could become integrated but you know basically what you'll be looking at if you need to have not only evidence that they converge but many many many cycles that they converge because in this case like each going apart and reekin verging as a sample and you need so many samples to build up statistical confidence in something so in this case you know we'd say very you know there's the absolutely no evidence that these things are cointegrated and here is cointegration without correlation the other direction it's a little bit of an artificial example but if you're like sampling one thing less frequently and sampling something else more frequently you actually could see conditions like this where there is no evidence that one moves the other so correlation of near zero but because the mean between the two is reverting you know if it's slow it's going to be higher in the future if it's high it's going to be lower in the future because the mean of the the two is reverting we do see that cointegration test p-value of near zero and again just as a reminder mean reversion is not the precise precise mathematical definition of cointegration for that you want to check out the other lecture on Co integration and stationarity and integration but for now you know that's that's fine for us okay now we're going to talk about hedged positions for those of you who don't know and you know at this point my hope is that you would know what a or tis but a short is when you take out a short position on something and a short position is when you have sold something without owning it yet and what that really is is that you have a negative amount of something in your portfolio it's not that you own something and you sold it to get yourself down to zero you actually sell it without owning it therefore putting you at a negative amount of something in your portfolio and the result is that just as if you were along something that's a positive amount in your portfolio now everything's going to be backwards so if that thing gains value you lose money and if that thing loses value you gain money and for the exact mechanics of how to do a short really what happens is you enter into a contract where your loan and amount of money that's equivalent to what you would have made it if you had sold that thing so if you sell 10 shares of Apple short and it should apples trading at $100 per share then you're going to be given $1000 as a loan that's the same as if you had sold 10 shares of Apple but then in the future you have to buy back shares of Apple at whatever price they're trading and that's what the contract says it says I agree to at some point in the future buy back shares of Apple at whatever price they're trading at and so if Apple of course is cheaper in the future then you can spend less to buy it than that thousand dollar loan you got you know obviously interest rates factored into this but you can spend less and then you'll have made money so that's the idea of a short and go ahead and there's plenty of information telling you know saying how this stuff works you can look it up online for now let's just assume we can do it so one of the important things here is originally hedge funds were developed as funds that went both long and short in the market and why did they do this well if you just have long positions in the market and the market goes off a cliff everything in the market is going off the cliff with the market that is always the case you know like it's just that there's there's if you have just long exposure in the market you are vulnerable to crashes so there the thought was okay well why don't we also take out some short positions and in fact what if you took out an equal number of long and short positions in the market at any given time well then if the market going up you would lose money in your law lose money on your shorts but make money on your lungs and if the market went down you would make money in your shorts and lose money on your long so you'd be neutral market neutral so that's the important point here is that a hedge is a way to kind of try to cut risk in one of the most common ways it's done is by taking out shorts against some against some exposure in some way so if you have exposure to the market we can take out a short in the market to try to cut out that risk of the market going off a cliff and hopefully we'll still you know taking advantage of whatever insight you've had into the market and the other important point that I want to make here is that whatever you're doing quant finance usually you spend a lot of time and effort identifying some key crucial insight into the market something you know that other people haven't identified or something clever or whatever and what you want to do is you want to make a bet on that precise thing alone you don't want to make a bet that you know tries to take advantage of your insight but then also takes advantage of all sorts of other stuff that you don't want kind of has a lot of other risk coming along for the ride so shorting and hedging can be a really good way to make a really targeted portfolio that acts as a bet just on that one piece of behavior that you've tried to isolate in your quantitative analysis so the way we do that in Paris trading is we are going to go long the spread and short the spread but what does that mean well think about think about this you know example we developed earlier if these two things are especially far apart we want to bet on them being closer together that's the real bet we're making we're not betting on this thing getting going down because if we just took out a long position in here or a short position on here a short position in here bending it to go down well what if what if they come back together but they hold the markets up the whole time you know a good example might be here we're betting on them getting further apart so we take out you know a long position on this and sure enough they do get further apart but they've both gone down so just taking out a long position on this would be not enough so instead what we want to do is we want to construct a portfolio which is a targeted bet which makes and loses money when our bet behavior changes and not when the market behavior changes or when the underlying company behavior changes or whatever so the way we do that is we take out let's say we want to bet on them being further apart here we take out a long on the top and a short on the bottom and if in the future they are further apart then the bottom must have gone down more than the top went down in which case we've made more money on the short than we've lost on the long or the top must have gone up more than the bottom went up in which case we've made more money on the long than we've lost on the short or of course they could have diverged in both gone either directions which case we make money on both which is great but the I general idea is that if they are further apart in the future you have to have made money on this long short position of course this is discounting transaction cost and slippage and interest rate on the short so you know we'll worry about those later but that's the general idea and to take out a short position on the spread here you just do the exact opposite you take out a short on the blue on the top and a long on the bottom and if at some point they're closer together the opposite conditions apply and you must have made money so that's the idea what we're doing here is we are actually constructing a new synthetic asset and this new synthetic asset which is a long the spread asset this is kind of would be the spread we can buy or sell the spread well the spread is an asset which is a portfolio of you know let's say some usually let's say 50 percent or you know there are cases in which it could be not precisely 50/50 we'll worry about those later not in this lecture a little bit in this lecture but really not too much but let's say you know a spread asset is a portfolio which is 50 percent fifty percent long the thing on top and fifty percent short of the thing on the bottom basically this is betting that the spread is going to go longer so you buy the spread when you think the spread is going to going to going to increase and then you can short the spread as well you can you know take on a negative position in the spread betting it's going to go smaller by taking on the opposite position so we've constructed a new synthetic asset and in finance this idea of synthetic assets or derivative assets that take value based on other assets is just super central because again you always want to be able to make really specific bets on just the behavior that you care about and not worry about other stuff so in this case we've done is we've created an asset which we know will gain value if this increases and and we've created another one which we know will gain value of the spread decreases and now we're not going to worry about the actual positions we're just going to say we're going to go along the spread or short this bread and we've kind of abstracted that out we trust that we can make make or lose money according to how the spread behaves so let's talk about finding real securities that behave like this because you know the concept is cool and very useful but really it's it's only going to be helpful if you can actually find real things to trade that behave this way so we're going to show an example and I'm going to show one in you know one way you you could go about this but this also helps demonstrate a really important problem and you know statistics in general but also something that comes up a lot in finance and that's the following so let's say that you said oh okay so I want to find cointegrated asset so I'm going to loop through a bunch of assets since UCR this for-loops is going to do you know for each asset for each other asset you know take the assets and do a cointegration test on them if the cointegration test p-value is below that cutoff of 0.05 spit it out and say yo this is something you should be interested in just looking through all pairs and returning any pairs that fire is you know likely cointegrated now the problem here is something called multiple comparisons bias it's a huge problem in statistics in general and you know statistical analysis and the reason is as follows well remember that if you're running a p-value test with a cut-off level of 0.05 then that means that you'll be wrong 5% of the time on expectation if your test is properly calibrated and what that means is that you know if the p-values are uniformly distributed there's no signal in your data whatsoever then still you know five out of every hundred are going to fall below 0.05 and you know so that means that you know 5% of the time you'll get a yes cointegrated value when the underlying things are not really cointegrated it was just statistical noise that popped out those p-values and so let's say that we have a basket of pears and let's say that we have 20 stocks in our basket we're looking at all pairwise relations well this is the number of comparisons we're going to do it's just a simple cow thing counting the number of pairs so we'll say 20 times 20 minus 1 over 2 that's 190 and with a p-value of p-value cutoff of 0.05 we'll count the number of ones that we should expect to pop up as significant even though there's no relationship in the underlying data and so we should expect about 10 pairs to pop up as significant even when there's no relationship in the underlying data so what does that mean well if we're testing on 20/20 assets and you get 13 p-values that pop up are any of them significant I don't know it's tricky right because you can't really differentiate between the ones that truly work all integrated or just happen to be that way based on statistical noise because you did so many tests you're bound to get a few that passed the test so in practice what you want to do is generally try to do if you're doing it this way try to do a second step verification where you hold out some of the data and once you kind of have your candidates and do some more you know economic analysis and say hey I think that there's a reason that these things are cointegrated then you know then do a last step where you don't do a ton of tests maybe you just do like 5 tests or 6 tests and see if it's maintained see if the relationship maintains itself in that sample this is a form of out-of-sample testing which we talked about in the overfitting notebook in the overfitting lecture and another way is just not to loop through a ton of stuff but just to come up with hypotheses so pick two things because you have a reason and economic reason that they become integrated and test it but keep in mind that you do that 10 times a day or if you do that 100 times a day you're still going to fall into multiple comparisons bias even if you're coming up with economic hypotheses and testing them if you do enough tests you're still going to come up with multiple comparisons bias so oftentimes you still want to do some form of out-of-sample testing some some form of second-stage verification which involves very few tests just to be more certain for now we're just going to ignore multiple comparisons bias because it's a learning example and we're going to look at this set of securities and we're going to look at this time period 2014 we're sampling at a daily frequency now you can also get minute lis pricing data on quant opium this technique could work at any frequency it's just a mathematical technique and some things are qohen degraded on a daily frequency you know some things are going to go in and out every day some things are going to go in in every year and if you think if things go in and out every year in terms of the difference between them if they go up and down up and down on a yearly basis it takes a year for it to converge back down to mean revert then you might need like 80 years worth of data to convince yourself those things are cointegrated which can be really hard to do so in practice generally you want to find stuff that kind of moves on a faster timeframe the other reason being that if it takes a whole year - you know revert back then it takes a whole year for your position that you take out to possibly make money if you take it out when they are close together it takes a whole year or hat six months for them to spread apart again so you know if you're not going to be making a ton of money from that strategy potentially so that's why often times you want to look for stuff that's come integrated in a faster time scale um and there's some tests you can look at to do that we're not going to worry about those too much today but you can look that up kind of you're looking for the oscillation of the of the difference but you can see here this is the data we get we'll go ahead and just run this so we can we can run it ourselves that's the data and now we're going to use that function that we did up here to look at all pairs and so you can see here the color of the square corresponds to the p value now didn't I just say that you're not supposed to treat P values anything of them in binary yeah so why is this not black and white with this one pair that came out below 0.05 being white or black and then the rest being the opposite color well what we're looking for here and this is important in statistics the whole point is to make it really easy to prove that you're wrong and really hard to prove that you're right because the bad case is when you think you're right when you're not actually right that's you lose money that's when you know people potentially get very sick if you're working in medicine all this kind of stuff is when you think you're wrong when you think you're right when you're actually you know wrong when you when you think you're wrong usually that's a safe case because you don't have any information you're not going to make a decision you're not going to take action usually that's that's a safer case if you think you're wrong but you're actually right so in order to avoid the case where you think you're right but you're actually wrong um just make it really hard to prove to yourself that you're right and so what we're doing here is we're saying okay well I see that these two things are Co integrated a well likely Co integrated according to this test which is prone to lots of multiple comparisons bias but let's what what are some of the ways in which these things could not actually become integrated there's lots of ways one of them is known as confounding variable bias and you can look this up confounding variables are when two things seem to be in a relationship but really there's a third thing which isn't the same relationship with each so let's say that X and y you thought they were Co integrated but really what it was is that X was Co integrated with Z and Y was going to create with Z and because they were both confirmed dated with this third factor Z it caused them to move kind of similarly but there's no real relationship there and it could break down at any time so um that's not as confounding and you could look for this because if you found something and this is why we included the market in our basket spy if you find stuff that seems maybe cointegrated or seems maybe correlated but then also has a strong correlation or Co integration with some other stuff maybe the market or maybe a sector index that's an indication that maybe there's some confounding going on or maybe what you're looking at is not really Co integration as it is just mean reversion so like there's there's there's lots of different you know possible things that could you know screw up your analysis and this is just one of them is confounding variables so what we're looking at here is just here's the relationship we found if this were all green or if this were green that might be in warning sign to us to go look for signs of a confounding variable relationship that's how I use this plot it's you know I don't infer much more out of it so now we're going to actually just get the series and we'll see how they look so here's how here's how this thing looks the difference between the two first thing we'll notice is clearly the difference is not normally distributed because you get this massive jump here which you just don't see in in normal distributions you know compared to the movement the rest of the time and otherwise it does look like it mean reverse so that's a positive sign you know that's that's again we're dealing multiple comparisons bias so we can't really trust this but let's say hypothetically we weren't the other thing we're doing here is we're actually computing a linear regression fit between the two series why are we doing that well we're computing the spread as not just one minus the other but one minus something some some constant coefficient times the other and the reason we do this is gets to this technical definition of cointegration which again you can read about but imagine it intuitively as if maybe one thing is that a super different scale than the other and maybe you know you have to multiply one thing up to get up to the scale of the other so that the difference is really mean reverting otherwise ones effects will overwhelm the other effects the general idea is it just helps you know express one thing in terms of the other more more equally get them into the same space and we can actually check what our estimated beta is here or be 1.53 and so if we look at the mean of s1 it's about 21 the mean of s2 60 so it makes sense we're kind of multiplying the smaller one by a number to bring it up to uh to the to the first and and again remember what's happening here is because they're moving together the linear regression should capture some of that relationship right um and in fact if we go down here there was a last plot here which you can see here this is how they move together and what the linear regression is doing is just saying you know as they move along this one is moving about 1.5 three times as much as the other so we're just normalizing that down so they both move the same amount and then we're going to look at the difference that's what this is doing if you still understand it don't worry about it too much go through the lecture I talked about earlier the cointegration integration and stereo charity lecture it'll explain all this stuff in detail but for now just keep in mind that we actually compute the spread is this kind of this modified difference rather than a straight difference that's what the spread looks like doesn't look bad at all to me this is the ratio which also you know can have some interesting information just the raw ratio without that without that constant coefficient computed this is an interesting artifact maybe in there indicating that we should be a little worried is that it seems like the ratio between the two grows and then shrinks and there's this weird behavior here I don't know what's going on there maybe it's worth investigating maybe it's not now there's a big problem here and whenever I'm doing this lecture in person I always ask people if they can see it and the point is to really drive home how hard this bias is to find in practice it's really really difficult to find in practice and this bias is look ahead bias and what I mean is well we're computing the spread here but the spread is based on this estimation of this this beta this this this coefficient but betas computation is based on all of the data so if we are trying to make a decision about this spread you know in March we'd actually be using information from the rest of the year which we wouldn't have had in March so it's an inaccurate representation of predictions we would have been able to make in March at using information from the future and it's going to make your strategies look a lot more predictive than they actually are and and the same thing is true if we try to compute a z-score so we're going to compute a z-score because these scores are really useful for normalizing data these just take data that could be in whatever space and they normalize it down and they say how many standard deviations away from the mean are you and that's your z-score right that says if I'm currently three standard deviations away from the mean my z-score is three I'm currently negative three standard deviations away from me and my z-score is negative three um and you can see here we can take that we can take the series minus the mean over the standard deviation and that's what we get right here of the difference I'm a member remember we're plotting the spread here that we computed earlier so we're just plotting it you notice the shape doesn't change at all the shape is preserved we're just putting it into a y-axis which is more comparable by itself it has more meaning because like this spread right here that doesn't really mean anything to me but now we put it into a space where it's actually meaningful it's like how far away from the mean are you but of course this z-score uses a mean and a standard deviation is computed over all the data so at any given point in time you're using information from the future so you have that look-ahead bias again so even if you were implementing a simple strategy which said like go long the spread when it's below negative one so we cruising along okay we go along the spread here and when the spread approaches zero is when we close out so we wait we wait we wait and then maybe we close out here and we make money on that difference because the spread is now way higher than used to be when we do the reverse when the spread is above one so we wait till here spread it above one you know we go short the spread well we've lost money as long as it stays above one so we'd actually be negative on that position for a while here but then as it proaches back down to zero we'd liquidate and make money so this is actually you know this is this is indicating that we could potentially make money but it's using information from the future which is not a good sign so we're going to use what are known as kind of moving statistics moving averages moving stand deviation these are also known as rolling statistics windowed statistics wherever you want to call them and the general idea is that instead of computing over the entire window we just for each point in time look n sample points back so n days in this case and we compute the statistic over that set of days rather than the entire data set so in this case computer rolling beta and the rolling beta is just that look the linear regression but done on a rolling basis over the last 30 days we compute the rolling beta so now at each point in time the beta is based on the last 30 sample points we compute the spread that way and then we compute the moving average of the spread for the last one day so this is literally just the current value it's just a one day moving average the point here is that you could have it be a two day moving average a three day moving average if you wanted to smooth out some of the noise in your data but then of course it's a limit going to become lagged you're going to be making trades a little late and you have to kind of you know deal with that so we're not going to worry about that for now and then we're going to look at the 30 day moving average so this is our longer moving average and the 30 day standard deviation which will compute a little later but here you can see just plot out the one day so that's just now the spread but the spread is computed by this rolling beta so you can see here not only do we not have data for the first 30 days because we need 30 days built up to start computing that beta but you can see the shape doesn't change a whole ton but it does change because now we're only using information from the past to compute that beta and then this is the 30 day spread so this this is the average the last 30 days of the spread is computed using this rolling beta and this is what we're comparing finally with this 30 day standard deviation and so what we're saying is our new rolling z-score is our current value minus like how it's been over the last 30 days minus the standard deviation of the last 30 days you know how extreme is our current value taking into account what's been happening over the last 30 days and you can see here that's what this looks like substantially different shape from that z-score we computed earlier and that's because we're only using information from the past in a given point in time and so now this is our new trading signal where it goes up and down and you can see it's different it's different and the point here is just that this is now no longer using information from the future and I think oftentimes you'll compute this and you'll be like oh this doesn't look as good well yeah because the stuff that looked really good was using information from the future of course it looked better you can adjust these parameters you know you can adjust the length of your windows but in practice you shouldn't try to like adjust them to try to make your returns better and we discuss without knowing what you're doing I'll say without all your doing and we discuss why in the overfitting lecture we actually have an example of why you shouldn't do that so I would check that out lastly here's just all of them plotted um so you can see you can see you know like what's actually going on here here's the price and then here's our historical information only computed z-score um so you can see here here might be a signal that it said you should probably be going short the spread here's a signal you should probably be going long the spread and you know does that make sense I don't know it's it's you'd have to you have to kind of check back and see does this actually make money so the in practice there's many ways to implement this stuff and you know you can you can you can do a lot of different things we have some template algorithms in the lecture series currently it's attached to the Paris training lecture but it may be it's actually its own lecture in the future we haven't decided template algorithms that implement this stuff that you can use to try to actually trade this there's lots of post in the forums you can go check out which talk about pair trading in fact there's a really good thread for when we actually originally posted this this notebook and it's called how to build a Paris trading strategy on quanto peon so if you go to the forums and you go how do you build a pair that's auto search apparently the searches in great me today because you need BS so you just sit here you can see actually there's a lot of views about 31,000 views at this point lots of comments but there's actually a really interesting discussion that happened someone whose works at a large fund I'll let you figure out which one actually came in and started talking about you know like their approach to it and how maybe I'm wrong how other people commenting are on how other people are right etc and actually put a lot of really useful a really live really useful content of this thread so there's a lot of information to learn more pairs trading tends to be a lower capacity strategy so you don't have to have as much money to start in practice you never want to trade just one pair you'd want to trade a ton of different pairs you know and that's because in each individual pair what's the real chance that you can predict that it's going to stay contig rated not a you know like yeah it's you can maybe predict it but let's say you can there's a 60% chance that it's going to remain cointegrated um you know then you probably want to be invested in a lot of different things so that you know on average most of your bets go okay and you can make money on the hole even though you're going to lose money as certain things blow up so in general you want to have lots of different pairs running in the game and time you know hopefully from lots of different industries to kind of diversify out your exposure so there's lots of different other in practice considerations a lot of people discuss that in the forums we might at some point have an advanced Paris training lecture that discusses those but it's an interesting strategy hopefully you got the idea and again like I said I strongly recommend that you go ahead and check out the cointegration lecture which will have more information on time the mathematical Pacific's of cointegration and and help you understand maybe more tests you can do in checks you can do to make sure it's really happening hey everybody thanks for watching the quanto peon lecture series i just wanted to let you know you could get more content if you are interested here's the quanto peon lecture series page this is available at wwlp.com slash lectures it's easiest just to google quanto peon lectures and if you're already on the quanto peon website you can get to a vile urn and support learn every lecture has a notebook most lectures will have videos you can watch to follow along just like the one you're watching right now and in addition some lectures also have sample algorithms you can clone and play around with and maybe even use to start developing some of your own trading strategies in addition to the lecture series we have github in case you're interested in checking that out that's github.com slash quanto bian slash research underscore public and we also have my twitter account at the street quant finally we also have services for schools and academics and that's quant opium comm slash academia you can see here some of the offers that you know we have for professor's everything is free but you know we offer a little bit more help to educators who want to use the platform finally you can always email me at Dulaney at quanto being calm again that's de la and ey at quanto p.m. calm feel free to shoot me any questions we really appreciate feedback on anything we're doing here thanks very much

Info

Channel: Quantopian

Views: 41,333

Rating: undefined out of 5

Keywords: finance, quantitative finance, risk, risk analysis, math, statistics, algorithms, algorithmic trading, pairs trading, trading, pairs

Id: JTucMRYMOyY

Channel Id: undefined

Length: 47min 33sec (2853 seconds)

Published: Wed Oct 12 2016