Financial Engineering Playground: Signal Processing, Robust Estimation, Kalman, Optimization

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] okay so our first plenary speaker is Daniel Palomar I'm very pleased to introduce Daniel Daniel Palomar is a professor in the department of electric electronic and computer engineering at the Hong Kong University of Science and Technology in a fellow of the Institute for Advanced Study at HK us team his current research interests include applications of convex optimization theory game theory and variational inequality theory to financial systems big data systems and communication systems dr. Palomar has received numerous awards for his research he is the recipient of a Fulbright research fellowship to young author best paper Awards by the I Triple E signal processing Society one of them as a co-author and an HKUST excellence research award his dissertation received several awards won by the Technical University of Catalonia UPC won by the Epson foundation and won by the Vodafone foundation CEO IT he's also very active on editorial boards he has served as guest editor of special issues or associative for five different I to bleed journals so it's a great pleasure to welcome Daniel to 500 and the SSP workshop [Applause] it works okay good morning everybody thank you Peter and the rest of the organizers for inviting me here to give plenary talk I'm really happy honestly to be able to give this talk because of the topic in particular I'm going to talk about financial engineering so I think there is a big misconception in this community about what it is financial engineering basically years ago I used to work in another area I used to be in in communication systems and now when I need to tell my my colleagues that I'm working in financial engineering they look at me with this face it's a face that I can recognize it's not a nice face they try to hide it there is something going on they don't like it and I don't know why basically I don't know what they think financial engineering is so that's the issue right I think there is a misconception so my point today is to try to tell you that really signal processing is everywhere within financial engineering you will see no matter what you do some people work on random matrix Theory some people work on particle filter in Kalman filter in robots optimization optimization algorithms machine learning deep learning stochastic optimization and chance constraints any topic that you want different people work on different areas I'm sure you can find a place here so that's that's the point of this of this talk let's see if I if I manage so basically there are so many things that I want to tell you right that's a problem because I don't have the time to tell you about everything so I had to choose but there are many other things really there are many other things that are really beautiful for us for people in signal processing so I was planning to tell you about my personal story first but I'm gonna skip that because there is no time I wanted to tell you because I switched to this area like 10 years ago so it was not an easy path so I wanted to tell you stories but no time I'm sorry so today I want to tell you a few a few topics one is this about robots estimation which is very important peanuts then I want to tell you about Carmen Carmen it's it's amazing calm and yes Carmen that it was developed you know initially to track missiles and all that everybody uses Carmen in different areas actually it can be using amazing it's really beautiful and then I also want to talk a little bit about portfolio optimization okay we do beamforming right we do linear filtering guess what portfolio optimization is really the same okay so let's go for it Oh before I start I want to tell you about this I want to tell you a little bit about financial data what special financial data I mean we use data in all different areas writing complications in in speech in image processing we use data so what's special about this let me tell you a little bit I don't want to spend too much time but a little bit so there is this thing called stylized facts about financial data the thing that each market has a different type of data but overall all the markets share some some special properties let me mention a few of them but before that let me from the beginning let me tell you what are we going to play with so this is a this plot is just the the the price of a stock actually we don't really deal with prices because it's difficult to model a price so actually we model the low price because the lock price can be easily modeled as a random walk okay so this is for example the low price of of the sp500 index okay the problem with this is that you know it's not stationary so it's not very nice to to model so actually we don't use this we use the returns okay so the returns is what you know ready so this is the typical expression for a this is called linear return it's just the variation of the price normalized with respect to the original price that's it this is when you here Apple went up 1% this is what we are talking about okay these these return but there is another type of return we call it log return basically remember YT is the log brighter so this is actually the difference of low prices so if you write it down you realize that this looks similar to this in a way and actually they can be related with this function okay but in principle there are two different types of return but it so happens that the returns are usually very small numbers so when you can do a taylor approximation of this function log of 1 plus x and if you do that you realize that the two returns are actually very almost the same so we can forget about these distinction okay just return to make it easy okay so now I can plot the returns of the sp500 index look like this okay it looks a bit more stationary it's not a random walk anymore but still is not stationary right you can see that the variance is high in some times then it goes down so these are these are this stylized facts about finance so basically we can clearly see here that it's not stationary however if you look at a small window it can be stationary or on a window but but it's tricky it's tricky because of this non stationarity you can you can make models with some historical data and then you don't know if later in the future the model is going to be good or not so it's really tricky it's really tricky so let me let me give you at least I mean there are this is the leaves of this stylus facts there are many there are many I just want to mention a few here one of them is the one that I just mentioned the lack of stationarity okay really this is very important okay so yeah watch out the fans brochures when you when you read the brochure or an investment fan and they said all the performance was fantastic in the past year yes but it doesn't mean later it's gonna be good so another important topic is that heavy tails okay when I switch to this new area financial engine this is the first thing that I encounter I was so used to Gaussian distributions in communications we use Gaussian and it's fine but it finance is not fine is horrible so that's the first thing I encounter and that led me to working in in robust estimators actually I'll talk about that so another important thing that I wanna mention a volatility clustering if we go back you can see the volatility clustering here Oh volatility is just the standard deviation okay it's like the envelope of the standard deviation you can see that when the volatility goes high it stays high for a while and then it goes down and it stays down this is called volatility clustering it's very typical okay so these things have to be model of course so okay let's take a look I wanna I want to illustrate a few points here this is very important actually what am i plotting here I'm plotting the historama of the return okay like the PDF of the returns I want to see what is the shape is it Gaussian and that's the point I want to see if it's Gaussian or not why do we have so many so many historians because of something that the does have index T that we were using what is the meaning of T this is up to you T could mean day and then you are talking about daily returns or T could be week and you're talking about weekly returns or monthly returns so you can choose the frequency actually now I'm showing here the histor on four different frequencies and it changes a lot so that that's very interesting so for daily returns for daily daily returns well you can say oh the blue line the blue line is trying to fit a Gaussian by the way you can see okay maybe look like Gaussian but actually no I don't know if you can see it clearly but the tails the tails are heavy okay the tails are heavy so Gaussian has a very very thin tails and here there's no way we can say that this discussion okay now if we look like weekly it stayed it's the same heavy tails now monthly still heavy tails but something is happening something's happening in these low frequencies that that it's becoming asymmetric it's tilting to one side okay that's very interesting so again usually we don't model that is called the skewness usually with a model skewness we need to model right especially if you are in these frequencies of the returns and look at this it goes crazy here is totally asymmetric okay so these are things that we need to into account other thing otherwise nothing is gonna work okay good so let me skip that so this is the volatility class do you know that I was mentioning you can see with the red in in red like it's like the envelope of the standard deviation that's a volatility car thing we need to model that we cannot oh by the way if I didn't know that this was the the returns of a of a stock what do you think this looks like in fact to me this looks like a speech signal right when you look when you plot a speak speech signal it also looks like that so I know I don't know anything about speech signals but actually maybe there are things that people use in speech signal processing that can be used here I don't know it could be interesting to explore okay now and this is the other issue the frequency is very important there are different regimes of frequency for it the common frequency is the daily returns daily returns that's the one most people use because you can get the data for free online and then you have enough data so this is a bigger one then you can consider low frequencies like weekly monthly but it's not so common because then you have less data that you can use for fitting models and then this is interesting high frequency this is like the this is very different you enter in a different world it's like the quantum world here you get in intraday and we are things happening once you call intraday we're it's another world I don't have experience on that area I will get into that someday hopefully but we are things happening where people start playing games and then you need to have really good computers because speed is very important at that point and also and also the data is very expensive data is expensive so you need to be really expensive you need to pay a very expensive subscription and Bloomberg and so it's not the same okay anyway so let's start with the modeling okay we have these returns and we want a model right that's what we do in signal processing we fit we use some model we fit the model and then we use that for predictions or whatever so let's take a look basically the idea is that we want to use all the information that we have in from the past up to time t minus 1 we call this ft minus 1 is all the history and we go on we want to make a prediction for the for the next time T so we want to predict the return T so this is our prediction this is like the conditional mean it's our prediction based on some model whatever and this is the error in the prediction W which has some covariance matrix so anyway forget the details but the two important thing is the mutant our prediction and the Sigma T which is our the covariance matrix of the summation error okay so let's take a look at different models there are many models there are there are books on on models I mean books at 1,000 pages on different models ok most of the models by the way we know them in the signal processing community so let's take a look this is the simplest one iid model which is false but people use it you know to write papers and you know Markov is wrote these these seminal paper in 1952 and he consider also this model it's ok it's a good approximation basically you assume that the the mean is constant and the covariance matrix is also constant that's it that's the assumption so another model is factor model also we use it in signal processing it's like an iid model but you realize the following you realize that the vector of returns can't have a very large dimensionality maybe 500 elements there 500 stocks but then you realize that actually all these stocks are driven by a few factors a few could be just one factor or two or three or five then once you incorporate that this structure then you get like low rank covariance matrices and all these kind of things that we do in signal processing right this is by the way related to PCA okay PCA is when you don't know the factors and you want to estimate the factors by the way in finance in finance this model is very important and in finance they do it a bit different though in finance they have they have the factors they know the factors how do they get these factors is very interesting there is there is a whole market on data in the financial world basically you have these big companies like Bloomberg and Barra those companies they spend a lot of money creating these factors they analyze the companies they analyze the data from the companies and they generate the hundreds of factors for each for each company and then they sell these they send these data to investment fans hedge funds very expensive this is really very expensive so if you are a small hedge fund you are not going to have enough money to pay for all these factors for the good ones you need to get only the cheap ones and it makes a difference later so there is a whole market on these things okay anyway let's move on what about the the forget the IIT model what about the correlation in time there is some correlation not too much again this also depends on the frequency that you choose but there is a little bit of correlation you may want to capture that correlation we know this in signal processing right we've been doing this forever so for example we have these models the auto regressive models that they are models right basically you express the return based on the previous returns multiplied by some coefficient and this is the moving average part anyway we know these models nothing new okay good and once you have a model you can make the prediction of the MU and then you can have the covariance matrix of the residual fine we know these things let me tell you something that maybe we we don't know because this is very particular from finance okay what's the problem with this model there are many models like this many many so what's the problem with these models the problem is that we are modeling their returns and if you remember they're locked returns they are the difference of the lock prices so we do that because when you take the difference of lock prices you get something stationary and then it's easy to model right but you may be losing some structure when you take the difference of the low prices and if you are losing some structure and somebody else is able to capture that structure that person can make money and you're gonna lose so you don't want to lose any information so already in the in the 80s so nothing new already in in the in sorry in the 70s these guys but by the way these guys you know they all of them they got Nobel Prizes and all that so in the in the eighties they proposed these they included this error correction term and they Co in fact they call this vector error correction model vegam and you can say okay what's the big deal about this okay there is a big deal because this matrix PI has a very particular structure it's a low rank matrix the rank tells you a lot about how to invest invest and the subspace generated by this matrix tells you a lot we will talk a little bit about that later but it's like a secret this part is like a secret parameter if you can estimate it better than others that's it you are gonna be able to make good money so we are signal processing people we know how to estimate things we can we can impose low rank sparsity we know how to do this so by the way on Wednesday one of my students is going to present a paper on this topic how to estimate pi and including low rank sparsity all that stuff so anyway I will not say any more about this but well the matrix pi is gonna come up later but that's like a secret parameter that we really want to estimate well very interesting I could talk for hours about that really but anyway so okay what else any more models anything missing there are hundreds of models of course but there is a big thing missing is the elephant in the room what is the elephant in the room I'm going back what is the elephant in the room well we have these sophisticated models whatever you want however all these models take a look all these models have a fixed covariance matrix the covariance matrix fixed we are just modeling the mean but not the covariance and we saw we saw this phenomenon that is called volatility clustering right so we saw that the variance has to change over time it's not changing this is horrible you can be as sophisticated as you want model in the MU but you also need to model the the variance or the covariance matrix so again hundreds of models for that hundreds so for example and you can see right if you don't model the covariance you get something like this this is constant constant volatility and this is real data there is there are discussed Urso we need to model data we cannot ignore that so how do you do that very simple in the model there was this residual right that we call W now okay that W has the covariance matrix fixed so now now we are going to generate this W a the product of two guys Z is like a normalized you know random term with unit variance and the Sigma is going to provide the envelope now you can put models on the Sigma that's it that's the idea of what all these hundreds of models do they put some model on the Sigma so that captures these volatility clustering that's it we don't need to spend more time on this this is and then again a Nobel Prize for all these things in the 80s and this is like a one model the first one arch and then the guard generalized arch basically again you can see that there is like like an auto regressive component you see the Sigma's in the past deterministic man the Sigma now and then there is a moving average component many models like it anyway good about that so and then it worked fine you can model the clustering good so oh this is it gets crazy oh this is fine but that was for a scaler a scaler residual for the multivariate residual we have a vector now this guy is a vector and this has maybe like five dimensions of 500 now you need to you need to actually model a matrix that is 500 by 500 and yes you can you can create models forget the details you you can there are many models but all these models have these matrix coefficients each of those matrices is like 505 500 what's the problem with that the problem is that you cannot estimate those parameters reliably because you don't have enough data you have too many parameters not enough data so you're gonna do overfitting nothing is gonna work that's that's the message so very tricky so when you estimate you need to impose low rank sparsity we know how to do that right in signal processing so we can do a lot of stuff here anyway let me move on now how do you estimate those parameters we know that right signal processing we do maximum I really hope we do least squares fitting we can do this thing and we can import sparsity we know these things we can do lots of thing we know can be used here oh so for example at this example this is the Vega model where the PI matrix the secret matrix has to be low rank and then we also want to impose sparsity so you can formulate some problem so my student is going to present in this topic on Wednesday if you are interested okay let's move on let me talk about proboscis des maîtres now what do I mean by robots des maîtres I mean that we are gonna consider that a Gaussian distribution is not good so now heavy tails so let's see what we can do in fact we know all these things from from the 70s so you know nothing really really new so basically consider the simple model iid if you want to estimate the mean how do you do sample mean if you want to estimate the covariance matrix of the residual what do you do sample covariance matrix that's what I did when I was getting into financial engineering this works really bad okay they are really horrible what that well the reason is that these sample is by the way of course if T goes to infinity we no they are consistent estimators bla bla bla but in finance t doesn't go to infinity t is the number of samples that you have and you don't have so many samples I mean if you use daily data in one year you have like 252 if the dimension is 500 you don't have enough data so big problem anyway you can actually derive these sample estimators as the solo as the maximum likelihood estimators assuming Gaussian distribution right we all know that so basically you assume distribution of the returns in Scotia you form the livelihood and you derive the the estimators indeed sample estimators this gives you a clue about why they are so bad they are so bad because they are assuming a Gaussian distribution and we know Gaussian distribution is a joke in finance is not gosh on heavy tails so we need to reformulate everything from beginning without assuming Gaussian we can do that okay so the two programs that I want to briefly mention is one is the heavy tail issue of course and the other one is the small sample regime as I said if we have 500 stocks and two year of data that's about 400 salvations we don't even have more observations at the dimension as you know that's a joke right and lead you need to have more data than the number of parameters you want to estimate I mean so let's take a look small sample regime we know this already in Singapore sitting going back to the 70s and 80s in the beamforming area right we know that already this paper for example from the early 80s we use the sample coverage matrix to design beamforming and it was performing really bad people realize that you could do this thing we call at that time diagonal loading it was a heuristic thing but it worked really well diagonal loading later on recently we were able to we I mean the community so we were able to derive these diagonal loading in a more formal way from a robust optimization perspective and all that in another word for diurnal loading is shrinkage in fact in finance people use that a lot in communications also this is related to people working in random matrix Theory many people here working in random matrix theory so basically again you assume some they are unloading but you need to estimate this shrinkage factors and you use random ad theory to find them lots of papers on this topic okay applied to communications and finance this is interesting in 2004 these people became very famous because they proposed a very simple estimator of these shrinkage factors and let what was a student he graduated and then he went to the industry he became a millionaire using his estimator and now he still published his paper his it's quite interesting the state of the art now as far as I know is that this is like a linear shrinkage very primitive right now they are working on nonlinear shrinkage methods based on random matrix theory and all it's very sophisticated anyway many people work on this area here and no major story ok good heavy tail issues again nothing new here we can go back to the 70s everything's done in the seventies right so we can go to the 70s the idea is to assume that the returns don't follow a Gaussian distribution follow some heavy tail distribution for convenience we can use the family of elliptical distribution they look like this look it's almost like Gaussian they have like a mean and covariance but instead of having the exponential they have some arbitrary function G G determines the thickness of the tail so let's assume we know the thickness of the tail then that's see we can do maximum likelihood you form the Lila Hui and you derive the solution done now let's take a look at the solution it's very interesting if you take a look at the solution for the estimation of the meal it's actually a sample but with weights every sample is weighted interesting and then if you look at estimation of the governance matrix is the sample covariance matrix but with weights there are these weights this is really beautiful so after all you still get the sample estimate but with weights now of course it gets complicated because the way it's the way it's depend on on on the distance of the sample to the mean but precisely you still don't know the mean you are you are trying to estimate the mean and the covariance and now the weights depend on the mean and the covariance so actually you see it's a fixed point of question right this is very common in robust estimation you have these fixed point equations no big deal you actually use iterative methods and then of course you need to analyze whether they are going to converge or not so it gets complicated but we know these from the from the 70's look Moroni from the 70s iterative method the iterative method is what you expect it to be basically you have initial an initial estimation of mu and Sigma and that gives you the distances with the distances you have the weight and you compute the next estimation for me on Sigma and you iterate that's it is it going to converge yes we know from the theory from the 70s that it converges fine very nice but for all these you need to know the tails you need to know how heavy the tails are you can estimate that no problem but there is another similar that I want to mention because it's really cool Tyler Tyler in the 80s he said ok I don't know how thick the tails are and I don't want to know so what he did is he took the samples assumed zero mean here he took the samples and he normalized the samples with with the length by normalizing the samples you get rid of the tail he's beautiful and he drive he derived the PDF of the normalized sample and in the PDF of the normalized samples indeed there is no tail the shape of the tail the G function is not here anymore so this is always the same is really beautiful so now you can do maximally good on this really beautiful and then you just estimate them in the Sigma this is a very famous Tyler estimator again fixed for any question again the the you see this is like a sample covariance matrix with weights and the weights are one over these again fixed point a question again you need to do an iterative algorithm again you need to make sure it converges conditions under which it converges very nice 80s everything done what are we doing here I don't know so now you really this is this is very important by the way robust estimation it's a must in finance because let me show you an example imagine a this random point the the black point generated from a heavy tail distribution with the shape the shape is the black one you see this ellipse so here okay now in this case we have four outliers four guys a little bit further away if you use the sample covariance matrix you get the read estimation horrible you're gonna lose all your money but if you use a robust estimator you get the blue really nice really nice this guy loses the money and this guy gets the money very nice okay now something that was missing perhaps from the eighties was the combination of both combination of robots estimators for heavy test and low sample regime so in a way you want to have some kind of shrinkage together no big deal right you just take and you know the fixed point equation of Tyler and you add a realization term no big deal yeah of course but this is very heuristic okay so people already like like almost 10 years ago they realize this a big heuristic so they were able to derive these these diurnal loading from a formal perspective and there are different ways of doing that one nice way is to formulate the maximum likelihood for militia but with a regularization if you include a regularization term then you can get this diagonal loading nicely and then you can develop the whole theory of existence uniqueness and then you can develop algorithms iterative algorithms everything very nice in the past I would say in the past eight years different groups of people have been working on that and I think now we have a nice understanding still some things left to be done but that's it I need to move on okay now does it work yes of course of course so this is the error in estimation of the covariance matrix versus the number of samples okay the more samples the better of course but this is the sample covariance matrix horrible horrible this guy here is the lead wat Wolfe estimator the guy who became a millionaire so it's here so big improvement of course but still not very good right because it's still assuming Gaussian still assuming Gaussian and I have Tyler this is the Tyler one really nice however Tyler only works up to you know you need to have more samples than the dimension in this case the dimensionality was 14 so at least you unit unit of 40 samples so but now when you combine the shrinkage with Tyler then you get these these beautiful one and also this one these two so beautiful beautiful right we can do nice things okay let's move on yes I really want to talk about these today come on let's start with this cointegration I need to tell you about cointegration cointegration is one of these concepts that it's specific or for finance I mean we don't know these things in other areas it's very specific fauna it's very interesting the idea is the following the idea is that it may be difficult to predict the stock of the price of stock one and it may be difficult to predict the price of stock two but it may be easier to predict the relative pricing and you can use that to make money it's really beautiful the analogy that people used to explain this is imagine a drunk man with a dog okay so these these guys walking in the street he's drunk so he's doing a random walk precisely now you cannot predict the path that he's gonna follow it's a random walk okay then the dog the dog is there I am peeing moving around you cannot predict the path of the dog either it's a random walk however you know that the dog is gonna stay together with the master or vice versa maybe they're gonna stay together in the long term right even one hour later they are still gonna be together right that's cointegration that's it isn't that a beautiful idea so let's take a look let me tell you formally what is the definition so correlation okay we know correlation right in signal processing correlation of two random variables is high when they co move and it's zero when they move independently okay everybody know that now if you read the definition of cointegration look like this contraction is high when the two quantities move together okay sounds the same contraction is inexistent if the two quantities do not stay together sounds pretty much the same so in is it clear the difference or not well it depends it depends it took me a while to understand the difference but I can show you some some plots that you would see the difference is really clear but let me tell you a very very simple way to understand the difference basically correlation you can think of correlation as telling you that short-term variations of the two socks if the variations are similar then you have high correlation but it's a short-term thing on the various cointegration is more about the long-term you don't care so much about this moderation you care about the long-term so it's very different and let me illustrate let me list it look at the plot look at the two curves they blue and the red okay they represent low prices of two stocks okay so they are random right now if you look at the variations they are highly correlated you see when the blue line goes up the red goes up when it goes down it goes down so the variations are highly correlated and I'll show you later our scatter plot they are like totally correlated pretty much the variations are highly correlated these two guys are highly correlated however look at the long term behavior they started together and after a while they drift apart in fact if you take the difference between the two curves you get the black one you see the black one is drifting away from zero so that means there is no cointegration okay one guy is one line is the guy and the other line is the dog they don't stay together no cointegration so indeed if I show you the the variations the log returns of stock 1 & 2 they are highly correlated so you see that was an example of high correlation no cointegration and I can show you the opposite example look at the the two plots right the blue and the red if you look at the small variations they are uncorrelated when they when the blue goes up the red goes down or uncorrelated however in the long term they stay together is the dog the dog and the under master there is cointegration if you take the difference between the two guys you get this black curve which stays around zero okay so now we are going to talk about this black line the black line is very important it's called spread that's the one you're going to use to make money to trade you are not going to trade on the stock on the blue stock you are not going to trade on the red stove you're going to trade on the black you can think of the black line as an artificial asset that you have created and then you're going to trade on the artificial one that's the idea of of pairs trading is called pairs trading very interesting it's really beautiful from a Singapore problem from a signal processing perspective cannot get more beautiful than these right so ah the variations are totally uncorrelated in the Adept so hopefully it's a bit clearer now the difference between correlation and configuration luckily for you it took me like two years to understand to understand the difference I mean okay now I don't want to get into the mathematics but just one thing one small thing so that you you you can appreciate the beauty look this is this is a simple model for the low price of stock one and low price of stock - okay they have a common trend physical ex is a common trend it's like it's like a random walk right and they have and then they have an individual residual another component that is zero mean and this guy another one okay now as I said it's difficult to predict these guys because they have this common trend there is a random walk however the beauty is that the trend is common that's why it's called common trend and you can see that you can you can multiply Y 2 by gamma and you can subtract that from Y 1 and then you get rid of the common trend that's that's how you get that spread in fact this is what we call spread basically is in this case is y 1 minus gamma Y 2 you see and you get this there is no common trend anymore all that you have here left is just the residual which is zero mean that's the spread you are going to trade on that it's really beautiful now again it's beautiful because you need to know the gamma first of all you need to know that the tuga you need to find the two stocks that are Co integrated not easy because you have like hundreds thousands of stocks you need to find two that are going to read and then you need to find the gamma the gamma is a secret thing we have two secret things today right the PI matrix and the gamma in fact they are related inside the inside the PI matrix there is the gamma really beautiful so anyway that's it let's see oh the important thing about the spread is that because it's a zero mean it has this property it's called it's mean reverting mean reverting means that it can go high but you know it's going to come back eventually because it has zero mean you use that to trade and let me show you how you trade a spread very simple this is the idea the black line is the spread okay you know is mean reverting and so you set two thresholds this is one fresh fall so for example when it goes down when it crosses the first of all you know it's undervalued and you you know that eventually is gonna go up because it's mineral thing you buy at that point and say when it recovers and goes back to zero then you sell and so you make you make money you make the difference the difference between this price and this that's it doesn't matter you make pump he's like money for free what can go wrong what can go wrong so that's it every time you do a trade you are making that amount of money determined by the threshold this threshold that you choose this is the threshold right and again when it goes up it's over value so you do short selling and then when it goes down you do you buy to unwind the position that's it I'll show you later how these work with real later okay now how did you estimate the secret gamma its secret so yeah it's very simple well to discover parents you can just do brute force view and there are more sophisticated methods but you can do put forth to estimate the gamma just least-squares we're gonna see it in the next slide Lee squares very sophisticated of course you can do actually more sophisticated things actually from the vacuum model that I mentioned before the secret PI matrix which was low rank if it's low rank you can write it as alpha times beta beta contains gum my inside is really beautiful not hiding really beautiful so let's see Lee squares let's do least squares okay I cannot believe I'm giving a plenary telling you how to do least squares but look remember this is the spread right now because because it's a spread we know is it's mean reverting so we know it is it's basically a mean and some zero mean residual right so it means everything okay so then you can move this guy to the other side and you can write it like this no it's okay the same thing but here you can see you can do least squares right basically you can regress Y one from y2 and then you can estimate the gamma and the moon this course so you have some observations you have T observations and you and you you you take the difference of these you take the square and that that's three squares you do it done let's see how it works this is an example okay real example I did it last week so this is a here I take two ETS so let me let me tell you what we are seeing here basically the black curve is the spread okay the block so what you want to see you want to see that the black curve is going up and down as as many times as possible that's what you want to see because every time it goes up and down you are you're making money how do you make money as I said when it it crosses the threshold you buy so then that's actually the red the red the red curve is the signal the signal can take three values 1 0 minus 1 1 means that you buy 0 is that young wine - is that you do short selling anyway so again you make money when the signal is going up and down upon every time up and down you are cutting money that's it and here I'm plotting the wealth this is the wealth so you see every trade gives you a tiny amount of money ok of course there is no magic every trade gives you a tiny amount of money but you are accumulating a lot of these trades Bam Bam Bam Bam Bam so basically it looks beautiful right it really looks beautiful you're accumulating all this money by the way what is this blue line here this blue line refers to the training the training face so I use all these data for training and these data actually what am i doing sorry these these data is for training and this is the actual training and so what you see here is of course during the training window of course you make a lot of money but after the training phase you're still making money good everything looks nice now the next sample is the bad one something happens what happens is that because things are not stationary the regime has changed and the cointegration has been lost the gamma the gamma has changed and if you are using least squares and you don't adopt look what happens you train you train the spread looks fantastic in the training phase but after that it drifts you are losing all these money and this is like one year you are losing money from one year you design your fantastic gamma least squares you put your money the whole year losing money so yes that is not easy to implement this in practice ok so that's when Kalman comes in tries to adapt to the variations of gamma and moon so and very simple the solution is come on let's take a look it's the same thing but now me and gamma we assume that their time are in that's it and we we we introduce some other on the variation of me on gamma this mother is nothing really just from one time to the next there is some additional noise whatever so you see for those of you from here with Carmen this is the state transition a question and this is the observation equation so that's it Kalman ok the typical Carmen right the state transition and the observation and in this case the hidden the hidden state is the MU and the gamma that's it I'm not going to tell you any more thing Kalman that's it so let's compare let's compare with real later this is the low prices of two idiots I chose this you know because you can see in this part in the first part the two course they they maintain the distance so they are cointegrated okay they step the same distance but here things have changed and then cointegration is lost later it recovers again but you see let's see what happens with least squares and Kalman and all that stuff okay so this is the tracking of the MU under gamma so it's tracking of course Kuhlman that's tracking so it's actually tracking yes so least squares it uses up to this blue line for training and then it doesn't change so you see that the black line is the estimation of MU using least squares doesn't change and here's estimation now the green one is Kalman so in in the initial period it coincides with least squares of course as it should be but then at some point when the cointegration changes it adapts okay good and here in adapt now what is the red one the red one is I try to use a rolling least squares why do you need common you can just use least squares but owner on a rolling window basis we don't need to get too sophisticated well and guess what it doesn't work well what happens is that there are too many parameters to fit there so if you use the rolling window to smooth too large it doesn't add up fast enough and then you lose a lot of money if you use the window too small is really noisy when I say it really is like unacceptable it means that that the red line it would oscillating like crazy like crazy off the charts now you say okay instead of rectangular window I can use some exponential or triangular okay yes but you are doing overfitting then I mean for every pair you need to choose the right we know the right length it doesn't it really doesn't work with Carmen you don't need to worry it just works good so which one works better in practice let's take a look at the spread again what do you want to see here these are spread what do you want to see something around around zero something that oscillates around zero so for example the black one least squares in the training face yes fine in the initial phase where cointegration is nice nice but here it gets it gets lost and you see the black line goes up it doesn't stay around zero you are losing money like for two years and then it comes down and then down you lose money again for two years so look at the green one the green one stays around zero ticket it so let's see the training let's see the training with with the Kuhlman training with a green one so it's nice look at it's beautiful so the black one the spread is really beautiful all the way it adapts it can adapt right and if you look at the cumulative PNL that the wealth it goes up yeah maybe some period a little bit dammit overall nice now final comparison let's compare least squares rolling these squares on Coleman just to prove the point that indeed Coleman works well is this right so the black one is least squares it works well up to some point and then really bad then rolling these squares it improves a little bit but this is cheating I cheated here as I told you because I did overfitting I just chose the right window length drawing window shape so that rolling least-squares work fine in practice I'm never gonna use rolling a least squares you can use it if you want and then Carmen Carmen fantastic fantastic so really beautiful so I don't have time to talk about discrete state hidden Markov model but basically this is used to detect the regime right I've seen this used to detect the state of a channel whether you can thing a channel that is in a good state or a bad state and I've seen papers using hmm to to the whether you are in good channel robot channel same here right in finance you really the investment strategy is going to change depending on whether you are on a bull market or a bear market so you want to know whether what type of market you are in in this particular moment so this is this this is regime detection hmm you can use it no time no time sorry but basically you do it and you can detect different regimes it works it works actually surprisingly okay so let me talk about portfolio team session god I want to talk about this again I could talk about this for hours I mean this is actually man I like this topic because I you know my my topic is optimization so I like this because you will see this is like being forming people who know been forming people who know linear filtering everybody knows linear filtering this is portfolio you will see it's really beautiful so first let me let me tell you a notation okay what is a portfolio a portfolio is actually a vector with weight okay we call it W now we normalize the way it okay basically each each element of W tells you how much money you put on a stock and remember we maybe we are investing on a universe of 500 stocks so this VIN vector is 500 so okay so it's normalized but in in real life you have a budget B so actually that the amount of dollars is w times V anyway that's a matter we just use the normalized one so it turns out it's easy to derive that if you use this portfolio W and RT RT it's the return of the of all the stocks 500 stocks the inner product between W and R gives you the return of the portfolio okay just believe me it's easy to the right so it's quite nice right very nice now this model looks very familiar right is the typical model that we use in informing and linear filtering oh by the way let me mention one thing now given that this is the return of the portfolio and that RT RT is random is a random vector with with mean mu and covariance Sigma that means that the expected value of the portfolio return is this and the variance of the portfolio return is this okay these two quantities are key we are going to use them later to design portfolios remember these took these two quantities these two quantities are the two quantities that Markov is use in 1952 to write his very seminal paper okay very important anyway let's let's look at the signal model let's compare with beamforming beamforming we have an array of antenna elements and we receive the signal and we call that snapshot right so XT is like a vector containing the signal from all the antenna element we call that snapshot and then we multiply each each guy by a weight and we add together that's been forming in fact w is called beam vector right this is the same for linear filtering right in xt you would put the samples from the past time linear filtering good so look at the model or mean look at the model for for portfolio is the same the same the only difference is that here we use transpose so it's because everything is real value and here we use well hermitian it could be complex value the real value depending up to you right depending on the own day but here yeah we don't have complex valued numbers in finance as far as I know I haven't been able to find a reason to have complex valued numbers but maybe somebody has an idea that would be very interesting right but anyway now okay you do this filtering or beamforming and now the good signal comes from some particular direction let's call a then you can you can write this as the signal power that good signal power right and you can write this as the interference plus noise power right we do that all the time now in finance well if you take the expected value of these this is like the good good stuff this is the good stuff portfolio return and this is the volatility volatility is actually the square root of the variance okay you can see this similarities now in communications or signal processing we use the ratio of these two this we call that signal-to-noise ratio right and if you buy a router for home and you you open the specifications technical specifications of the router you're going to see CEN are all the time oh it's in our three DBS and our so we use ethanol a lot guess what in finance they also take the ratio of these two these and they gave you a different name it's called Sharpe ratio okay so sharp but a sharp got the Nobel Prize as well so sharp propose this ratio is called Sharpe ratio is pretty much the signal-to-noise ratio basically the only difference that this is like the square version of this this is square that's it but it's the same thing and again if you want if you take a brochure of an investment fan they're going to tell you the Sharpe ratio if you didn't notice then the next time somebody comes to you trying to sell you some investment fund take a look they will so look at the Sharpe ratio so they really use sob ratio it's fascinating anyway so oh so this is the Makah portfolio he wants to maximize the expected return and minimize the variance very simple convex problem QP Nobel Prize for this but it was revolutionary at the time it was really because he was the first guy who introduced the variance there so it was really revolutionary now if you just want to minimize the variance you get this it's called minimum variance portfolio guess what you know him informing and linear filtering there is something called minimum variance distortion is response right and it's exactly the same it's really the same so all these in in the beginning Markovic is really the same and then then Markovic not my coffee but portfolio optimization can get more complicated later I don't have too much time but I want to tell you a little bit so basically two problems with Markov it's ok two big problems in fact Markov is is not using practice nobody uses mackovic he he got Nobel Prize nobody uses in practice why many reasons but one of them is people don't like to use variance as a measure of risk and they have proposed more sophisticated measures no time to talk about that but just let me tell you that people in signal processing working in topics like stochastic optimization and chance constraint I know one of the organizers can ma works on John Coltrane you can do something here and then another reason why Markov is that is not used is because it's very sensitive to the moon and the Sigma remember mu and Sigma are estimated and there is a lot of summation noise so what happens he's very sensitive doesn't work at the end in real life so then you need to do whatever you can but we use robots of mutation again a lot of people here do robots optimization robots mean for me we can do a lot of stuff good now with a little term that I that I have I want to talk about two different portfolios index tracking quick what is index tracking very simple and in depth like sp500 is is a definition it's a definition that you say okay I'm gonna include in this vision 10% of this talk 30% of the 50% of that but it doesn't exist the index doesn't exist just a definition what if you wanna create the index you can do that right how do you do it just by 10% of these 20 and 50 you create the index but then sp500 you need to buy 500 stocks and keep rebalancing every now and then 500 stocks paying the transaction course nobody does that in practice what do they do in practice sparsity instead of buying the 500 stocks they want to track the index but just using say 40 stocks sparsity that's it this is sparse regression I don't need to tell you more we know these things are processing we do this all the time you define a measure of error right in this case l2 norm and you want sparsity is non convex we know how to deal with this thing no problem works really well in practice these are these are benchmarks that people use and this is the tracking error so the tracking error using our signal processing the resource fantastic great a lot of room for things to do for people working on sparsity and let me finish with this oh it tracks very well look look just to show you the blue line is the index and the red wines the one we generate with real with 40 stocks only so we track the sp500 with only 40 stocks and the Greek one is an improvement that we propose basically in the tracking error instead of penalizing when you do worse and better than the index if you only penalize when you do worse but you don't penalize when you do better you get the green one so you can actually do better it's very nice let me finish with purse trading first trading portfolio now I'm not going to go into the details because there is no time but remember that I talked about that magical matrix pie in the vacuum model okay so so that that magical matrix pie Oh first let me tell you one thing when we were doing purse trading we were doing purse trading with two stocks right we were buying one stock and - gamma of the other stock right that's like having a portfolio with two components and with weight one and minus gamma that's an important realization so when you do purse trading you are actually using a portfolio but what stops you from having more than two stocks when you do purse trading can you do purse trading with more than - yes it's not copper trading is called statistical arbitrage it gets more complicated because then you need to find like more gammas so we can do that with the vacant model it's really it starts getting complicated but it's really beautiful anyway and that that's how we come to the PI matrix the PI matrix is low rank so there is a beta matrix debate beta is the the key is the magical one beta is beautiful beta gives you a subspace of mean reverting spreads so let me repeat imagine you have 500 stocks they are random walks right all the stocks if you feed a beggar model you're gonna get a small sub space of mean reverting spreads and now you can do purse trading on all these guys but it's a subspace so some directions are better than others can you find the best direction on Wednesday my student is going to talk about that indeed you can try to optimize a portfolio within that subspace and then you get the best in Direction so let me just show you the final result ah by the way the formulations they start getting complicated so you start getting like non-convex it's not convex problems and that but we are signal processing people we know how to deal with these things right we can do it no problem but everything in non-combat at this point let me just show you this the final result it is the the blue line is the wealth that you can make just using the magical beta the red one is when you opt my the direction within the subspace given by beta clearly you you can do much better so it's really beautiful it's all signal processing right all signal processing so let me summarize I have talked about different topics different people are gonna work in different topics some of you random matrix Theory some of you on robust estimation on Kalman particle filtering job shrinkage random matrix Theory robust summation robust optimization all these topics fit perfectly financial engineering really so it takes a while to get into the new area but but some publicity this is a book that we wrote in it was probably in 2016 this is the book written for signal processing people so look at that a signal processing perspective on financial engineering so it's written for signal processing people this is the book that I would have liked to have to have hat ten years ago when I started 10 years ago I didn't know what to read this is the book that is an entry point you can you guys can easily read and see what you can do oh by the way grafting are processing that is very popular now can be also applied in finance okay of course deep learning and all that so this is easy every way they told me that they you can get it for free by the way if you go to the web page of the of the book you get it for free during a few days because they know I'm giving the planning so you can get it for free okay so thank you very much this is all my students that have been working together and this is my beautiful campus in Hong Kong so thank you very much [Applause] Thank You Daniel for a very engaging talk do we have questions from the audience you mentioned Kalman filtering and heavy tails are important in this problem how do you how do you help the Kalman filter in the heavy tail environment that's a very good point right because I talk about Kalman which is a swimming option and then I talk about heavy tails indeed if if you think that in that particular model the distribution of the residual is not Gaussian indeed then you should use some extended comma filtering or you can just particle filtering yeah so I know some people who work in particle filter in topic they they try to apply these things in finance so it definitely definitely room for all these things so it's true there are a lot of things but what things would you actually trade on so at the in this particular area at the end you trade on the spread so but there are other ways to to trade that this is just one particular topic the other portfolios like the index tracking is different Markov is different so it so there are different approaches in fact let me tell you this let me this is I think is very little if so there are like two families of strategies for investment one imagine the prices right the prices have this noise but they have a trend there is one family of strategies that that try to make money based on the trend and the small variation they treat that like noise so that's bad so they want a trend that's like Mike Ovitz on the other hand the first rating does the opposite the purse rating doesn't care about the trend because when you form the spread you remove the market so you remove the trend and you make money with that with a noise really beautiful some strategies make the money with the trend and some others on the noise okay thank you thank them Daniel for a very energized talk I was just wondering the major difference between learning and estimation right is their generalization capability and in financial engineering and suppose you want to learn not only to estimate but to go beyond the data so what people have drive from machine learning that works in financial engineering so why don't you use support vector regression Sandra like for example okay yeah yeah of course fantastic questions the traditional approach is to have a model from Statistics and yeah and then it can explain things about about the phenomenon but maybe the prediction is not good then you have machine learning they don't care about that the modeling black box but prediction probably is better yes so people now try to use deep learning in finance now everything is very secretive in that area unlike other areas where people just get a result that they publish it and so everything is very fast right in these areas everything is very secretive nobody is gonna tell you if they have succeeded in something and that there is there is a problem in finance the problem for for using deep learning in finance is the lack of data for deep learning you need lots of data and for for people working in images they have lots of data and but you don't have so much data in finance in general it's very limited so you need to be very careful of course you can always find some specific area in finance where you have a lot of data yes especially if you go to to intraday data maybe you have a novela so it's not easy it's not easy so I know many people who are trying you never know if they have to know they're not gonna tell you but it's not easy it's not easy because of the lack of data and so but yes definitely I know everybody's trying so my answer is yes it's very strong nobody's telling the successes thank you so you know talk today so you define the notion of a cointegration between two time series and I can imagine that you may people may be interested in checking whether three or more time series basically or integrate together or not and then how one can generalize the notion into the more more than two sees yeah yeah so this is what I mentioned sorry I mentioned is too fast but this is the idea here the idea is this precisely that the purse rating is only for two stocks so you have one and minus gamma yes but now if you wanna have multiple stocks W is gonna be a long vector maybe dimension 500 how do you estimate that well one way is from the vacant modelling from these magical PI matrix from these magical PI matrix that's that's one way of doing it because the debater and the beta contain in PI gives you this subspace and then you can use that subspace and then to play around so that that's that's one way of doing it yeah thank you it's called statistical arbitrage when it's more than two you talk us following question if you're really successful you will fail because you'll move the market okay let me process it if I'm really successful I will because I will lose the market Oh in fact yeah this is true I mean if like big big companies they have so much money that when they invest they are driving the market that's a big issue it's not my case right individual investors are nothing but it's true so and then there is a whole topic in finance it's called the order execution it's how to execute the order because if you have a huge order and you execute it you're going to kill yourself indeed so there is a whole a whole branch on how to basically what they do is they chop the order into small pieces and they send their orders slowly so that they don't write the market so there is a lot of optimization there because you want to know how to chop it off how long so it's a really beautiful topic yeah in control theory there are many papers in control theory on a topic can get very mathematical that's yeah very nice okay I would suggest that we take further questions into the coffee break is that there are two ways to get to the coffee breaks there's two such two stations over in the poster area you can actually go through here or right in front of the foyer but first thank you again for the excellent talk

Info

Channel: Daniel Palomar

Views: 6,867

Rating: 4.9543724 out of 5

Keywords: finance, financial engineering, portfolio optimization

Id: yyvIcdYkb9o

Channel Id: undefined

Length: 66min 11sec (3971 seconds)

Published: Thu Oct 31 2019