The 7 Reasons Most Machine Learning Funds Fail Marcos Lopez de Prado from QuantCon 2018

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  1. They have no edge

  2. They have no edge

  3. They have no edge

  4. They have no edge

  5. They have no edge

  6. They have no edge

  7. They have no edge

Saved you some time

👍︎︎ 61 👤︎︎ u/notferengi 📅︎︎ Apr 25 2019 🗫︎ replies

Funny from a guy who has repeatedly tried and failed to get an ML effort going at multiple funds.

👍︎︎ 16 👤︎︎ u/EvilGeniusPanda 📅︎︎ Apr 25 2019 🗫︎ replies

Look like the cover from Scarface.

👍︎︎ 6 👤︎︎ u/NormalAndy 📅︎︎ Apr 26 2019 🗫︎ replies

More information in this presentation and especially his book than this sub will ever provide.

👍︎︎ 1 👤︎︎ u/codefather_pl 📅︎︎ Apr 29 2019 🗫︎ replies
Captions
hi we're going to get started if I can have your attention well I hope everybody's had a great day I've had a chance to talk to a lot of you in hallways between sessions everything like that which is my favorite part so it sounds like it's been very very engaging a lot of exciting talks and I'm really excited about our afternoon keynote with Marcos Lopez de Prado who it's really I think a sort of singular figure in the field he's he's extremely well published and I think my favorite thing about Marcos is he's just incredibly generous with whatever he learns so in a you know in the the field of quantitative finance historically he's just extremely secretive and very very difficult to make contact with experts and I think for his entire career Marcos has really gone against that grain he's been very very open with his with his findings he's a bit of a burn the settle for academic publishers he's always challenging the city they are in the field and demanding scientific rigor and I think that that is exactly what the the field needs so I want to tell you about his background because it's remarkable before I do that I think a bunch of you were able to pick up his new book which is advances in financial machine learning it's really an incredible book it's very near and dear to my heart and and I do mean my heart and and the rest of quanto peein I he basically lays out the the vision and business plan and scientific method for our asset management business in that book in the early chapters which is incredibly inspiring and extremely helpful so I really highly recommend that book and I want to tell you a few things about Marcos so he's the chief executive officer a true positive technologies he's had a pretty pretty remarkable year he published his book but he also bought and spun out his business true positive out of Guggenheim Partners quantity investment management investment strategies business so he's been running no big surprise machine learning strategies there and this year who's able to spin out after managing up to 13 billion in assets with a trekker to go with it and Marco's is also a research fellow at the Lawrence Berkeley National Laboratory I was in the US Department of Energy and you'll see if you reads book he talks a lot about borrowing from the National Lab model in the search for alpha which i think is extraordinarily timely for the field he's also a top-10 most read author in finance Corney SSRN and as i mentioned he's just very well publishes published dozens of scientific articles on ml supercomputing and he holds multiple international patent applications on algorithmic trading he earned his first PhD in financial economics in 2003 and why only get one so he got his second PhD in mathematical finance in 2011 from the university god help me with the pronunciation of the spanish word i'm about to say universidad compute intense madrid close you can correct me to get up here marcos and he is recipient of spain's national award for academic excellence in 1999 and he had his postdoctoral research at Harvard and Cornell he's also a great guy so once you want up Marco's thank you for having me you guys are my favorite group of people and let me tell you why if there is an efficient if there is an an inefficiency if there is an inefficiency in financial markets anybody should be able and allowed to arbitrage it away and correct it right these are our markets these where people transact and exchange information and the saddest school today is that people are not allowed to do that in order to exploit and correct these inefficiencies typically you have to join a firm and you have to play by the rules and if you are not part of the right clubs or you don't have the the right background they may not like you and they may not want you and that sounds wrong in the 21st century and here comes quanto peon which is a wonderful experiment a wonderful experiment not only in the quantitative space but also in the social space where the goal is to disintermediate these all structures and that's why I'm very optimistic about quanto pian not about not only about what contouring can do for investors but actually what it can do for modernizing financial markets I'm breaking with the old paradigms that said we are all I believe very interested in new technologies and how these new technologies can modernize finance in particular today's presentation is about machine learning there is a lot of hype about machine learning not all of it is realistic at this point and the point of this presentation is to highlight a number of areas where machine learning can go wrong I think it is quite clear that machine learning can be an incredible irreplaceable tool for identifying inefficiencies in much in financial markets but it also with these power camp certain responsibilities and it is when you're a researcher you you have to make sure that this power does and does and doesn't end up being adverse to you all right so let me go straight to seven issues that I find frequently dangerous as it relates to machine learning the first one is a little bit philosophical is how should we organize our work what is the best way to organize work when you're conducting research applying machine learning and the old paradigm that comes back to the 1980s is a hedge fund will hire a bunch of portfolio managers in the 1980s many of them were discretionary portfolio managers and this discretionary portfolio managers will work in silos why will they work in silos well it makes sense a discretionary portfolio manager is someone who doesn't really know where he is making these decisions if he knew why he's making these decisions he would have the system and therefore he would be a systemic portfolio manager however since the decisions are not necessarily systematic and they are discretionary they're based on in many cases hunches or gut feeling how can these portfolio managers reach a conclusion well very often they don't know why they are making the decision they cannot rationalize why they are making the decision therefore it's very hard for them to work together that's why it makes sense for hedge funds to have this perform and it's working silos so that the diversification potential is exploited that way by having different perform a leaders making decisions and not influencing each other not trying to convince each other now when you bring this to the Qantas space it doesn't really make sense right the whole point of science is to build a dialogue where we can reach objective conclusions through discussion and testing so why would anybody ask scientists to work in silos it does not really make sense it is not how laboratories work it's not how universities work the whole point of science is to have a discussion a debate and to reach common conclusions so why would hedge funds do that with ones well one answer is because that's what has been successful with discretionary portfolio managers and they just think well let's repeat this protocol this procedure with with Kwan's it also has to be of course with secrecy with not allowing that some Kwan's take areas from other Kwan's and essentially the secrets go away and secrets are very important directions especially when they are automatable right when you can take the secret out of a place so there has to it has to do to some extent with secrecy but I find very often that the argument goes more like well you know everybody should stand on their own their own feet and come up with their own ideas and they should not cooperate well why this is disadvantageous because essentially building a quantitative study is much more difficult than making discretionary decisions right building it is a quantitative strategy involves a tremendous amount of effort and when you put these ones under a lot of pressure to come up with a strategy within six months essentially what I've seen very often happens is that they will come up with strategies that are either the result of practice overfitting just selecting running millions of simulations and reporting to the investment board whatever seems to work best or it will be some strategy based on some research paper very often will be some factor model that you know it has a lot of support in the academic community but the realities it may be an overcrowded strategy where with a lower ratio event what happens is that results are not where the back tests suggested and investment boards become the solution and eventually they shut down the operation so that happens very often right and this is what I call the CCP Inc ones right this is if you remember in Greek mythology Sisyphus was condemned by the gods to perform this daily routine of rolling a boulder up the mountain only to see the boulder roll down at the end of the day and he was never able to reach the summit the promise was that we feel were ever able to reach the summit he would finally be liberated but for the whole eternity Sisyphus was content to to do this futile effort where there was no hope of succeeding and I'm not saying that there is no hope of succeeding when you are on your own and you have no support and and you have no access to clean datasets or computing power it's becoming increasing increasingly difficult to succeed in such environment it would be equivalent to having a factory worker at the BMW factory and you asked that worker to one day be a welder on another day to be a mechanical engineer and Lord a TWiki to be an electrical engineer and eventually build the car with all these shops around him well that's not how car production works right typically what happens is someone is at the station and specializes and is the best that he or she can be at the job the same happens to work or seems to work quite well in finance you you can have people who are especially since supercomputer supercomputing and they will know a lot about pile processing and how to vectorize algorithms and there is so much to learn on that respect right there will be people who specialize in production software people who specialize in data cleaning and curation people who specialize in doing research coming up with what are the important features that explain a phenomenon there will be people who specialize in back-testing and think about back testing all that involves right transaction costs the possibility to trade or not what is the probability that you would get a feel and at what price so all these people kind of specialize and work together as a team right and that seems to be more efficient than asking everybody to be the best they can at everything so that's what I've seen that seems to work in in the new kind of quant firms where people are are truly the best they can at performing very well divided tasks and let me tell you one anecdote about the Manhattan Project I'm have a project involve tens of thousands of scientists and even though it was a very complicated project that involve cutting-edge science at the time they were able to divide the tasks involving that project to such level of granularity that even high school students were able to contribute to the project and that was Richard Richard Feynman's job at the muharram project was to coordinate the work of high school students and what was their work to calculate the yield of the bond of the bomb essentially they were performing the operations right they didn't have supercomputers so they had a bunch of kids solving operations that in an aggregate fashion would determine what is the yield of the bomb so that is the kind of of granularity and sub task partitioning that was achieved at that project meaning that there is no task that is so complicated that cannot be divided into tasks that are much much simpler and that seems to me something that especially when you have access to a large community of people collaborating would be very appealing pitfall number two is interior differentiation so you take any paper in financial economics written over the past 80 years and the first thing they do is always they compute returns they are trying to price securities that say that they are trying to price the value of a stocks the first thing that they will do is they will compute the cross-section of returns if this is a fixed income research paper the first thing they will do is they will operate with yield to maturities option adjust the spread you name it some sort of changes in the spreads right which what is the spread what what is the the Giotto material is essentially a duration adjusted price for a bond and they will they will compete they will work with a stationary series where they make the series stationary by completing differences of the yield the same with a paper dealing with option pricing well they will compute differences on the implied volatilities it's always about computing interior differences and when you think about it it doesn't really make sense I know everybody does it and every paper it starts like that but when you think about it doesn't really make sense there level information the prices implied volatilities yield they contain memory and when you're trying to make a prediction memory sounds like one thing that you want to preserve right you want to know how you reach that level in order to understand what the price is going to do in the future but the first thing that people do when they compute the difference is to eradicate that memory the memory is gone you compute a return and what you're doing is you're computing the divine in the price of today by the price of yesterday so whatever happened before yesterday is eradicated it's entirely removed now of course this is done because you need to work with the stationary series if the series is not as stationary then the algorithm will not be able to learn from comparable examples right you will not be able to derive a distribution that is a stable and if the distribution is not as stable the algorithm will not be able to identify patterns that are comparable to whatever observed features you have today so there is of course a reason for doing interior differentiation the problem is that when people compute a difference and the difference is in interview difference you achieve a stationarity at the cost of wiping all memory out so the question I ask myself a few years ago is is there a an optimal level of differentiation that allows you to preserve the maximum amount of memory rather than differentiating with an integer so for instance the return between yesterday and today is there any fractional differentiation any level of differentiation between zero and one right when you differentiate by order zero you are working with levels this U is not stationary so that's bad when you work so the series will be we have full memory but notice but it won't be a stationary when you are work with differentiation one the series will be a stationary but we have no memory so bad - there must be somewhere in between right there must be somewhere in between a degree of differentiation that is optimal in the sense that achieves stationarity at the minimum cost in terms of memory preservation and that's what you see in this plot here you can see a blue line that is somehow a mix of returns is it somehow a mix of white noise and levels right so you can see how the green line is the price of the SNP between 1998 and 2005 2015 sorry and the blue line is the series where we have a fractional differentiation where the order of differentiation is 0.4 this the blue line is both stationary and still it con it incorporates a lot of memory you can observe that when after 2008 when they simply plummets and loses something like 40 something percent the blue line remains there is a lot of noise but is still the the values are depressed for a long period of time until eventually that memory from the immediate history wanes so let me present you the same idea with a different plot in this plot we have two lines the orange line and the blue line the blue line is the statistic from an augmented dickey-fuller test an ADF test and as you can see that the value goes from around zero to something like minus 50 so it's the the values around zero when you have no differentiation so this you see is non-stationary and then by the time you differentiate you compute the return you jump from zero to one or the sign the value is minus 45 or minus 50 now the series is completely stationary I mean it's like you were expecting to to achieve a level of minus 2.3 right when the ADF stat is around minus 2.3 you already have a 95% confidence that the series is stationary or that you have rejected the null now the orange line represents the correlation between the differentiated series and the Arenal series right and when the series is not differentiated the corrosion is 1 because the Croatian between prices and prices okay when they when you differentiate with an interior differentiation of order one now that correlation drops to 0.05 or so so what happens in between you can see that at the level of around 0.4 you have a serious actually 0.35 in this case you can achieve a an ADF test of around minus 2.86 while at the same time the differentiated series fractional differentiate disease has a correlation of 0.995 with the renal series so essentially you achieve two things this is the fractional differentiated series is very close to the reno students and yet is stationary it is a unique phenomenon not at all there is not a single future where you need to compute returns meaning that for the past 80 years people have been publishing papers out there where the series were massively over differentiated there is not a single case you can see here more returns out of the 100 most liquid futures worldwide the maximum level of differentiation that is needed is 0.5 is never 1 and in fact the meeting is around 0.3 0.2 0.3 so think about it the level of differentiation that was needed that was justifiable in order to achieve a stationarity was 0.2 0.3 all the rest 0.32 1 just was for the sake of removing memory not for the sake of achieving stationarity is it therefore surprising that most academics claim that markets are perfect no markers appeared to be perfect by construction once you remove all memory markets become unpredictable it's not that markets are unpredictable markets are predictable we should be able to predict the markets but they become unpredictable once you remove all memory so they it appears to me that the efficient market hypothesis actually may be the wrong conclusion from utilizing a wrong technique all right pitfall number three is inefficient sampling in many papers you see that people academics especially they utilize sample they take samples based on a chronological clock essentially they will take samples every day or every hour or every minute either at a constant pace right at a time base constant distance now that doesn't really makes a lot of sense because clearly markets do not receive information at a constant rate markets receive a lot of transactions and volume but they open then there is a ballet around noon and then volume spikes tremendously around 3:00 p.m. until the close 4:00 p.m. new your time I'm talking about equity markets but you know in oil it will be the volume will spike around 1 p.m. in fixed income for depends on the market but essentially these valet feature is it's very characteristic now when you sample it makes sense to sample data when there is information to be extracted I mean why would you sample when there is no information to be strike actually that's a bad sample right it's about sample because it just leads to redundant observations observations that do not bring information it's counterproductive in fact two sample observations when there is no information to be extracted so alternative approaches are to compute trade VARs volume bars dollar VARs volatility on grunts order imbalance entropy bars the book describes many of these let me give you an explanation of one a sense so for instance you could compute dollar bars we're essentially you're forming a bar whenever you accumulate a dollar amount traded on a particular side so whenever markets are very balanced in terms of buys and sells you're going to produce bars whenever a particular demand has been traded now suppose that all of a sudden an informed trader arrives to the market and this informed trader thinks that the price will go up and therefore becomes a very aggressive buyer as a result now the bond will be very imbalanced and this kind of sampling mechanism will sample more it will sample more because it will be easier to generate an amount of trading on a particular side of the market in this case buyers and that's exactly what you want to do someone has arrived to the market this person has a symmetric information is utilizing this asymmetric information to press a particular side of the market the algorithm detects that and samples more why because you want to learn what does person knows right this person is acting this way because he has a symmetric information and you want to join that you want to extract that information that he that he said that he knows off and he's incorporating into the price let me give you an illustration of why these sampling methods seem to work well there are here three plots there is a blue a green and a black line the black line is the number of parts per day of a particular size let's say 50 on average per day that you would extract if you use stick bars if you let's say that you compute you sample one bar every time that is certain number of ticks is exchanged now as you can see this line is quite is quite far from a flatline he starts at the level of around 25 and then what goes all the way up to 100 and then it goes down to around 50 over time why because this bar is affected by high-frequency trading as high-frequency trading has become a prevalent technology in financial markets the order size has decreased because there are more tics for every given amount of volume traded as you can see beginning 2007 it goes up and then around 2012 reaches a peak and then there was a change in the fixed protocol that reduce the number of fragmentation in the trades so then the this line decreases but it's really a function of changes in the fixed protocol so we would like to avoid that right we would like up to about that the sampling frequency is a function of new technologies arriving or changes to protocols so that's where the green line arrives that's volume VARs right on volume bars take a take a sample every time that is particular volume not tick volume number of contracts is change takes place in the market and as you can see this series is now much more stable now the blue line is even more stable than the than the green line and that's all our vs. entually now we're taking to account information about the price that has the dollar amount exchanged not only the volume information but also the price information so that's a much more stable way of sampling and it's a stable and is less dependent on technological features or or price action so people number for wrong labeling very often you see in the financial literature that people label observations from financial markets in a rather automatic way as if they were dealing with faces from a Facebook or or or dealing with some sort of non-financial data set so essentially what they will do is they will take they will make some observations and then they will sample these observations based on what is the return over a given fixed horizon so over the next 10 bars or 10 days well that doesn't really make sense because when you think about it number one time bars are do not exhibit good statistical properties so when you complete returns based on time bars this series typically will have will be heteroscedastic so they won't have a constant variance they they will have they will exhibit substantial serial correlation well why because there will be many observations made during the night where there is very little volatility and then all of a sudden minutes some versions during the day which again they will have they will exhibit this sort of zero correlation the distribution will be non normal it will be skewed there will be a skewness and kurtosis so sampling with time bars is a bad idea and we have seen this before now labeling based on a fixed time horizon it just incorporates once again wrong information right you should label not based on what happens over the next 10 bars but whether you achieve the objective of a position under those circumstances so let me give you an example that's what that's by the way that's what I call the triple very method so this is the alternative to what is being done in most papers out there the idea is you are going to label an observation based on the outcome as if you had a position at that point in time so you form three bars you form two horizontal barriers one up and one down and then a vertical barrier and essentially this incorporates the information that you would follow if you were holding a position right the upper barrier is a is this is a profit-taking level the lower barrier is a stoploss barrier and the vertical very essentially just tells you we had we held in this position for a particular horizon and now it's time to you know move on now when you do that what is the difference the the the traditional method of a Causton of a constant time barrier or time horizon essentially the equivalent to have to holding a vertical barrier without the horizontal very without the horizontal where is you see that right if you are holding a position for the next ten days and you don't care what happens in between essentially this is equivalent to having no barriers no horizontal various and you just label based on the outcome at the end now when you incorporate the two horizontal various what you're doing is you are incorporating information about the path right now labeling will be path dependent essentially what you're doing is you will label an observation as a one is if you would have taken a profit if you would have made a profit that after holding that position if they lay the observation will be labeled as a minus one if you would have been a stopped out and who has not being a stopped out at some point in time it's very healthy right so not incorporating that health information into your machine learning algorithm probably is a bad idea and you have to give it to the academics it only it takes to be only an academic board label observations without taking into account the possibility of a stop out anybody managing a portfolio for real knows that you have to take into account the possibility of being a stopped out right it doesn't matter where the price is after ten days if you were managing a position during the flash crash you know that the stop outs can kick in anytime now how is this relevant well there is a lot of interest these days for what I call quant quanta mental firms and let me tell you one way one natural way that firms can incorporate information the how fundamental firms can actually use machine learning methods in their daily decision making one way is to combine to add a machine learning layer to your traditional fundamental decision making suppose that you have you work for a firm that is very fundamental where people make decisions based on either book information accounting or traditional economic models and now what you would like to do is not to challenge the decisions be made by those models but what you would like to do is to determine what is the bearing size how much should we bet on these forecasts made by the fundamental model so that's what I call meta leveling meta labeling is is a it's a very nice way of combining machine adding a machine learning layer to any sort of theoretical model whether the model is econometric economic fundamental it doesn't matter the Machine the the role of the machine learning algorithm is to figure out how much should we bet if anything at all in this prediction that the algorithm has made or or that this primary model has made now this is very helpful in the following sense suppose that you have an algorithm that has low precision but high recall right this is an algorithm that is able to identify almost all positives however it has a large number of false positives right and essentially like most economies who predict 20 out of the last ten recessions right and so that's an example economies are exciting example of an algorithm that has high recall and low precision and now what we would do is we would incorporate a machine learning algorithm on top of that economists another machine learning algorithm is going to figure out how much should we bet on the forecasts made by this particular economists given previous examples of predictions made by these economists and why did why is this helpful because if you have a model that has high recall it's not so important that it has low precision now the machine learning algorithm will reduce the recall in exchange for increasing the precision and there is an optimal level out there where you can maximize the harmonic average the harmonic average of the two of precision and recall and that's what these machine learning algorithm would do all right pitfall number five waiting on of non iid samples you know finance is very funny as it comes down as it relates to machine learning in a traditional machine learning application like for instance by medical research suppose that you are a doctor and someone asks you well what are the features that are conducive of high cholesterol what you would do is we would take your sample of everybody you know samples from everybody here in the audience and we would label them right and then we would ask some questions about what is your diet you know what exercise routines the genetic predisposition to cholesterol etc yeah you would you would ask a number of questions and you would try to identify these features are predictive of your observation that is in that tube one tube per person right independent samples one tube per person but finance is not like that right finance what happens in finance is that essentially someone takes all these tubes and messes them up and now what happens is that the blood contained in tube number-19 contains information from you know subjects eight seven six and five and tube number twenty contains him from it contains a part you know sample a sample from subject number twenty but also partly a sample from subject 19 18 17 16 that's that's finance we don't have access to clean observations why because financial series are multi collinear they're the exhibits serial correlation they are non normal they are all contaminated right there okay there is a lot of stuff going on in financial markets at any point in time and we can never isolate the effect of features its everything combined right so there is no such thing in finance as collecting a sample from a particular individual and then collecting an independent sample why because the cross correlations are tremendous and then to make matters worse even the serial correlation is substantial now why is this relevant well it's relevant because think about some of the major breakthroughs in machine learning these days right when a couple of years ago algorithms were able to identify faces very effectively or drive cars even though that track record has been tarnished recently now when you think about it here is my friend Luna my friend Luna can recognize faces very well I know it because she doesn't bark at me she barks at the neighbor however even though my dog Luna can recognize faces as well as Google's algorithms my dog Luna she cannot manage my investment portfolio well she can manage my investment for Oh more or less as well as most or managers out there in Water Street but that's not exactly what you would expect anyway I don't pay her anywhere close whatever machine you know a worse rapport manager with respect or be satisfied with anyway unless they just want to get some you know cold meal at the end of the day so this is the problem right the problem is that what appears to be breakthroughs somewhere else like driving cars and recognizing faces there are plenty of people who can drive cars there are plenty of people who can recognize faces but there are not many people who can manage portfolios it happens to be a very complex problem and part of the reason it is a very complex problem is not because the system is chaotic there are cardig systems out there that can be predicted using machine learning if you read the press over the past week actually for instance there were a group of scientists who were able to predict a chaotic system like wide spread or the spread of spreading of a wildfire over multiple Lyapunov cycles which essentially are full cycles of uncertainty it's like eight cycles it was incredible a machine learning algorithm could predict a chaotic system this is something that nobody thought it would be possible so machine learning truly is an incredible tool that can solve for card existence like the economy but the problem with finance is that the data sets are not as good as the data sets you can form in physics and part of the difficulty is exactly what we are discussing in this slide right and it is that observations are non-unique these observations are not independent and identically distributed and one thing we can do is to control for that lack of uniqueness essentially what we can do is we can determine whether the observations are similar to other service observations we have made how well suppose that for instance you make you observe some features and now you are labeling these features based on the outcome that calls over the next 20 bars right or or using the triple barrier method until one of the berries is hit now the next observation has some overlap information right the next observation is going to look at the side the next twenty one or twenty three bars and there is an overlap and you can control for that overlap essentially you you can determine what is the amount of share information between these consecutive observations and once you do that something very interesting happens and it is that once you sample we when you contact your cross-validation once you sample observations taking into account the redundancy the uniqueness of the you sample of more observations that are more more unique than what of course is that all of a sudden your out-of-sample accuracy increases dramatically right meaning that if you contact across a cross-validation and you don't take into account the uniqueness of the observations your accuracy is not going to be as good as if you if you introduce that additional information to your classifier so that's something that not many people do but definitely it has a tremendous benefits now since we are already talking about cross-validation let's discuss one essential problem in financial data sets and that's leakage this is why one of the reasons many back tests look wonderful in sample and then you deploy the strategy and it doesn't work out the sample so one of the reasons is because the machine learning algorithm tested something learned something in sample and then tested that fit those fit parameters out-of-sample in in the testing set the testing said had leaked information into the training set right and how does this happen well in finance very often why because the series our exhibit serial correlation and the labels are overlapping and as a result when you combine these two features to your correlation and overlapping labels very often there will be some examples in the testing set that are sharing information with samples in the training set what can you do about it well one thing you can do is of course as we said before to try to control for the uniqueness of the observations but that doesn't suffice I think that you need to do is you need to you need to perform two operations one is called portion and the other one is called embargo the difference between the two is porting has to do with removing from the in-sample observations that share information with other observations in the in the testing set so that's something that we can do because we know what is the degree of overlap between the labels in the two sets and the second thing is embargo essentially this is necessary when you're testing set predates the training set so for instance here as you can see the testing set is between two training sets one that happens before one that happens afterwards and you need to remove that overlap which is what purlins does but in addition on the right side there are some observations in the training set that are after the testing set and they should be removed once you do that what happens is that well typically Sharpe ratios decline in the practice because that's what should happen essentially you are removing the effects of overfitting of leakage and by the way one thing to determine what at what point you have purged an embargoed enough is when the sharp stabilizes and it ceases to decline hopefully about zero all right people number seven last one my favorite back to sober feeling and this is what I call the most important flood in finance I call it I call it that way because it has helped me a lot during my career essentially the way these plot is generated is as follows suppose that you perform a number of experiments say a thousand you perform a thousand of practice and these Baptists are trying to determine the profitability of various strategies whether their mean reversion trend-following looking for calendar patterns machine learning you name it they are trying to identify a strategy and compute the performance of that strategy now the the feature where do where do these series you are using for developing this strategy where do they come from well suppose that these series are actually random walks they are unpredictable so essentially what we are doing is we are just computing investment strategies on series where by definition there is no investment strategy they are just unpredictable what is the Sharpe ratio that you expect to achieve surprisingly the answer is not zero right when you conduct when you contact a thousand experiments on a on a series on a single series where the series is unpredictable they expected maximum Sharpe ratio is around three isn't that astonishing remember the series is unpredictable there is no strategy and yet they expected maximum Sharpe ratio is three well if you conduct a million simulations where you know today to perform in a simulation this trivial impact is now the sharp respect maksimka ratio is 5 and what this sells is the following whenever you read the paper in the journal of finance in any of the best journals anywhere how many trials did they offer conduct nobody knows most likely the author doesn't know because he didn't keep track of them some of these papers are written over many years the referees had no clue and everybody assumed in their statistical tests that a single trial had taken place essentially that's what happens when we utilize a confidence level of 95% a significance level of 5% we set the thresholds for rejecting the new at the 5% probability of a false positive under the assumption that a single trial has taken place now this is something that it has been known for a long time right already four or five years after the current framework for conducting statistical tests was set up people realize that these thresholds for rejecting the null couldn't be remain constant as you conduct multiple tests so for the past 90 years people have been conducting finance people have been conducting multiple tests without adjusting for the rejection thresholds so this is something that doesn't happen anywhere else right in medical research in physics people conduct multiple tests and they all adjust for this phenomenon the possibility that false positives appear more frequent as a result of multiple testing this is why a couple of years ago a friend of mine cam Harvey who also is a president of the American Finance Association published paper with this shocking conclusion most discoveries in finance are false it's just a matter of nobody is controlling for the number of trials so you said nobody even knows how many trials are taking place now when it comes to a strategy development is something that we need to keep always in mind we will always find profitable strategies even where there are none it's just a matter of trying and trying trying and I guess some people express this thought in the following ways if you torture the data hard enough eventually the data will confess whatever you want and this is the application to strategy development what can we do about it well this is the false strategy theorem is is a theorem that some colleagues and I prove in 2014 essentially what it determines is this - line this - line the this - line is the result of that theorem and this kind of heat map is the experimental verification right experimental verification if you generate random strategies and you compute X the expected maximum Sharpe ratio they actually they maximal Sharpe ratio you obtain this distribution and this dashed line is what the theorem had predicted and as you can see the theorem is very accurate for a very different very wide range of number of trials actually it becomes even more accurate as the number of trials increases and what is the conclusion of this theorem that essentially there are two variables that are never reported in academic journals that are critical to determine whether a discovery is true or false and these two variables are number one the number of trials please if you ever find a paper in any academic journal that in finance that says and I conducted two hundred twenty three trials please tell me because it will be the first time that anybody ever reports the number of trials number two the second variable that is that is very critical to discount for the probability of a false positive is the variance of the outcomes right why is this relevant well if you are conducting research that involves stratus let's say coming from technical analysis or counting sore spots or something you know when were using any sort of variable or intuition to make a prediction the variance of the outcomes is likely to be very wide so essentially it would be it becomes all of a sudden very likely that you will obtain a strata with a very high ratio now when you conduct a strategy research where you stick to a model right you stick to some pre-ordained notion of how the world works now the variance is going to be much smaller right because your degrees of freedom are shrunk right so that's something that is these are essentially the two variables that are critical to determine whether if this positive whether it discoveries is a true discovery or false discovery and they are never reported well when you are conducting your research you can always keep track of these variables right essentially what you do is and that's something that my team does whenever someone conducts a practice the system automatically loves information about that practice you know whenever you have access to information the research is going to be penalized for accessing that information multiple times and eventually someone will arrive at my office and say Marcos we encounter in country Australia with the Sharpe ratio of two and I will look to the logs and say hey but you know you access the data sets like five hundred times I was affecting a Sharpe ratio of five and come back when you have average of five - doesn't excite me and then he will come back with sharpish of five and probably he will have conducted 10,000 experiments and at some point in time it's good to give up at some time at some point then people should actually give up because remember you will achieve any Sharpe ratio you want ten twenty hundred hundred shirisha no problem is just a matter of trials and patience even though there is no strategy there so what can we do about it well we can compute the deflated Sharpe ratio and the deflated Sharpe ratio is a metric that takes into account the number of trials whether the distribution of the returns is skewed exhibits kurtosis it takes into account the sample length it takes into account all these features that are very relevant but in particular it takes into account these SR zip zero this variable that incorporates what is expected chart ratio under multiple trials and we take into account the variance of the experiments now a couple of days ago a colleague of mine and I published a paper titled detection of false strategies that explains a procedure for estimating these two variables very accurately so essentially what we have done is this paper will will show you how you can determine whether the strategy you have computed would you have identified is it true positive or a false positive and as you can see in the paper we have estimated the the efficacy of the method by running millions of Monte Carlo simulations and the method seems to be very precise essentially let's say that you come up with a strategy with a sharp edge of two and the the paper will teach you how to estimate the number of independent trials that have taken place which is in most cases much smaller than the number of actual the difference between actual tralala an independent trial independent trials typically are much less than number of trials and also it will teach you how to cluster your trials in a way that you can compute the variance between those independent trials and once you plug in these two variables you will have an accurate representation of whether this strategy is something you should be investing in or not is this a statistical fluke this is the result of lack or is this a strategy that performs much better than one what anybody Buddhist pegged they are sort of like and with that I conclude my remarks thank you again for your attention please go ahead and ask me any questions [Applause] yes definitely they're doing a great job and in fact we have for the past three years we have been writing papers where we are ignoring each other for the word that they are doing the work we are doing we are on the same page look I think we both recognize that it is very suspicious than in finance the rate of withdrawal of papers right the rate at which papers are recalled is essentially zero whenever whatever whenever a paper in finance is published most likely it will never be revisited and that's unheard in science right in medical research that with Robo rate can be as high as 20% in some journals meaning that the paper camp comes out and ten years later someone realizes well you know new evidence challenges what was published ten years ago and the paper is erased it disappears from the journal right it's no longer listed there and probably finance and theology or the the last two fields where papers are never recalled and hopefully you know cam has been a leader a thought leader on this respect that this must change Thanks yes yes so suppose that you have a series that is serially correlated right and so that's what you can see there in that is in that in the middle of the slide XT is approximately the same as XT plus 1 this is the serial correlated you make an observation and the next observation is very similar and at the same time the levels are overlapping the label that is associated with observation t shares a lot of information with the label associated with t plus 1 now if T and T plus 1 fall in different sets let's say that T belongs to the training set and T plus 1 falls in the testing set now the algorithm will learn in the testing set something that should only belong in the training set right and here what should happen is that observation T should be removed from the from the test from the training set so the algorithm should not be allowed to learn from something that is going to be used to evaluate the efficacy of the fit yep go ahead yes the so the question is what is the key attribute of success of a machine learning astrati well I think it has to do with number one the quality and and uniqueness of your data right you want to model strategies after data that is very hard to get not many people have perhaps they are data center are very hard to manipulate I know that I'm after after something when my data team complains right that's when my data team complains this data set is unbearable that I probably you know it's like going to strike gold in Alaska right where is their goal in the last quarter what's going on because nobody wanted to go to Alaska right so this is a data set that is that is intractable it's very hard to do process so that's one thing right another thing is to come up with good feature importance methods and that's something that you can see in the book there is a full chapter dedicated to feature importance why because research as it relates to financial markets back-testing is not a research tool and I know that that's a a polemic statement many people believe that a Baptist is what should confirm or unconfirmed a particular statement or strategy but then we are is once you are in the back testing mode research is already it's already finished you you cannot learn what something from a Bacchus and then reutilize if you realize it well you know you better keep control of the number of trials right and and then you will have to deflate the Sharpe ratio by the probability that this is a lucky discovering the typically what you want to do is you want to spend a lot of time identifying what are the features that are important for that strategy and once you come up with important features now you're going to build a strategy and at the last moment you're going to back test and you don't you really do not want to go into this cycle until eventually the data confesses whatever you wanted the data to confess you want to spend most of your time in feature importance one more [Music] yep well let's let's think a little bit about that it has to do a lot about how you process that data right you can distract a steal a lot of information from data sets that people have access to the ideal in the ideal scenario you have access to data set that is unpatch and nobody has it but that doesn't mean that that's the only way of developing strategies you can work with data sets that people have been working on for a long time and yet uncover strategies why because you are not using return since the Revolution returns you're going to use fractional differentiation and you're going to come up with a theory before applying machine learning let me let me give you an example brief example suppose that many people in finance think that machine learning is a black box and every time I feel that I I'd roll my eyes because machine learning is not a black box you can use it as a black box but you can use that the black box anything anything you want there is a way of using machine learning in a way that is not a black box it's a white box and the way is the following suppose that you are given some variables and you know you are put in the same position as Sir Isaac Newton when he had to come up with his equation for gravitational force so you're given this series and and you have you know some some series are related to the mass of the objects the density of the objects the reflectivity you know their their dimensions with their shape the distance between objects right there are a bunch of variables and now you run all these variables through machine learning algorithm and the machine learning algorithm comes back with the following answer well there are really really three variables that are important here one is the mass of object one mass of object to and the distance between the objects and all the rest is rubbish is relevant all right now Sir Isaac Newton would tell you ha that's very interesting I would never figure that out now what is the equation what is the equation that relates these these three magnitudes and the answer is well you know does the funny thing about machine learning it doesn't tell us the equation it doesn't tell us the theory we just tell us that these three variables are very relevant now the way I like using machine learning is machine learning as a research tool it tells me what are the relevant variables and now I have to come up with a theory that is going to combine that information into in order to make a prediction so to come back to your to your question I'm not super swated that the only way is to utilize rare data sets sometimes you can use data sets that people utilize but you actually are going to learn and you're not going to use these machine learning methods as a black box you are actually trying you will try to make sense of this data in order to come up with an explanation of what's going on so that's an example in addition to using fractional differentiation using a better sampling methods better labeling methods and so on and so forth there is a lot you can do with standard data sets yep so you have this deflated stop ratio yes and can use that to tell us that which machine learning algorithm is better than the other one you have a do you have a comfort yes and also can you tell us intuitively how you use the deeply Sharpe ratio to tell us which one is false positive yes so there is the idea the deflected sharpen issues to take into account information of whether this is a Lackey discovering right and as you saw in the in in the this volcanic plot that I had there that looks like an eruption but it's not this heat map it you can target any sort of shot ratio that you want the question is once you once you have achieve a particular shot ratio it is something that you should expect just out of lack randomness and and that's what they deflate the Charbagh should does so in the in the course of your research you should take you should keep track of the number of experiments right essentially what you would do is you would collect all the returns all the time series of returns that you obtain from every Baptist this is you can DAC I handed back this now you're going to have a T by 100 matrix where T is the number of returns per practice and now you are going to use that information to determine the number of independent factors that you have actually conducted and that's something that you can read in the paper that I Parrish last Monday so that's the way to extract this number here in this equation K and the way I use this and my team uses this is that people conduct all these factors and we know that eventually we are always going to come back routinely with false positives right what we need to be very careful about is when we are doing a strategy on any particular data set whether we have exhausted our lack essentially we are now we are so unlikely that we have made a lucky discovery and that's what we need to keep track of and that's that's one way that for instance this community could cooperate you you don't know how many experiments someone is conducting on the same dataset you're conducting other experiments but here is kwan talkin that is able to keep track of all the experiments so that would be one possible application thank you very much thank you that's great
Info
Channel: Quantopian
Views: 49,172
Rating: 4.9624252 out of 5
Keywords: finance, quantitative finance, risk, risk analysis, math, statistics, algorithms, algorithmic trading, machine learning, quant finance, 7 reasons, machine learning fails, quantcon 2018
Id: BRUlSm4gdQ4
Channel Id: undefined
Length: 73min 36sec (4416 seconds)
Published: Thu Apr 25 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.