Applications of AI in Finance: Using NLP to Analyze Market-Moving Language

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right so it's been a long time since I was an academic so forgive me for that but I so I am actually a practitioner in the field at this point I largely spend my days analyzing not just central bank communications but corporate communications as well so what I'm hoping to do with this talk is spend just a few minutes I'll try and keep it short because I'm keeping you from lunch but I spend a few minutes here talking you through sort of some of the modern methodology that's being utilized by not just the sort of startup community or technologists but actually by hedge funds asset managers investment banks and some of the things that we actually do sell to those folks so we'll so I'll take a few minutes here talk a little bit about my background how this how this came to pass what what exactly we're doing and a little bit more in more detail about the technology we've built and why there's been a few allusions to natural language processing and had levels of sophistication in that space and I'll talk a little bit about why there's a need for for greater and greater sophistication because we're dealing with increasingly complex language right we're not just dealing with you know this good buzzword or that or that bad buzzword we're actually getting significantly more complex at with regard to syntax sentence structure and then different speakers as well and then I'll walk through some the insights from our central bank data as well as our equities data looking predominantly in earnings calls and then hopefully take a take a minute or two and talk about some of these sort of applications in thematic research that this technology can actually offer really with some hopefully some follow-on discussion that'll touch on some of the things that have I think we're have been alluded to in earlier presentations around applications of this type of technology in the legal space the risk space and even insurance and so very quickly in my background as I alluded to I'm a I'll say recovering academic so I spent the first part of my career oddly enough as a game theorist before backing into macroeconomics during the financial crisis when I was modeling small group decision-making and decided that the Fed was a pretty interesting small group of decision makers what was sort of challenging for me at that point was I was attempting to model how markets respond to central bank's and what I found was it's all about communication but there was no systematic way for me to model language so I spent an unreasonable amount of my time delving into computational linguistics only to find out that the state of the art and sentiment analysis was really not sufficient to analyze something as complex as nuanced or as veiled as fed speak and that that caused me to back off and say there has to be something out there there has to be a methodology that yields comprehensive unbiased quantitative insights so founded prattle and really our mission is to automate financial research broadly right to streamline that process creating comprehensive unbiased cost-effective research on both central banks and corporate communications primarily for investment professionals so we do actually work with some academics several in this room Thank You Harry but obviously our core market tends to be people who can pay a little better than academics I do have to run a business so just when we talk about coverage universe we cover 15 central banks around the world or at least we advertise that we cover 15 we actually cover 20 but there's a few I don't feel like we have high confidence in that data based on really its lack of transparency in those markets and so we withhold some of that data for ourselves we then cover every publicly traded company in the United States processing every publicly available primary source communication for each and every one of those companies so that means we are processing just shy of five million documents a day in your real time so we have to sift through that information identify that an article about cooking apples is not the same thing as an article about Tim Cook from Apple but also believe it or not that's a big problem and you're automating that analysis but also identify an article about Apple is not the same thing as an article by Apple right and so we're processing all that dealing with attribution sifting through those and that information and honing in on specifically primary sources right so in the case of central banks this is fairly straightforward its speeches by policy makers it's press releases directly from the bank and then of course its policy statements minutes transcripts etc with corporate communications there's more of a gray area right does it tweet by a corporate officer count the case of Elon Musk that's largely market-moving but in the case of most corporate officers is actually not and so there's some ambiguity there right now we have withheld social media data as a separate data set that does not count in the five million documents a day we're processing but we do process every speech by a corporate office or press release everything released on the corporate website thank you very much reg FD companies no longer need to actually put things on the wire services which means that you actually have to go to their website to get the same information now a lot of that gets reproduced and then you have a near duplicates problem what happens if a new source reproduces what is something that is effectively a press release and but changes the title is that a new document or what if they put a new header on it or a new intro paragraph so there's a lot of a lot of breaking apart these communications so you don't end up having redundant information flows human beings are very good at this machine's not so much I had to build a lot of technology to solve those problems so it's a nice segue into some of the technology here and I think the ANA did a nice job setting me up for hopefully not totally disappointing you about our methodology so methodologically what spraddle doing and this is very much in this image is very much a caricature but effectively we are mathematically mapping linguistic patterns so if you think about how the human brain processes language we don't process language one word or one phrase at a time we don't take a 500 word communication and say oh well this word and then this word and then this word that's not our brain read/write but traditional sentiment analysis was actually even more rudimentary in that it would identify a couple dozen buzzwords that were relevant maybe a couple dozen phrases right some would be sorted into positive sum would be sorted into negative you would add up the positive scores add up the negative scores subtract the negative from the positive and there's your aggregate score that's adorable but it doesn't work right it may be you may glean something useful at a Twitter but you're really not going to clean anything useful and it's something as complex or nuanced as a Fed statement so we had to build a methodology that more closely mimic the way human brains process language the idea being that if we could identify the interrelatedness of language those nodes of connectivity between words phrases sentences paragraphs we could add we could actually get much closer so that 500 word communication doesn't turn into a couple dozen terms it turns into about on average 84 billion nodes of connectivity right now you then have an interesting data set right off of a fairly small number of communications the major innovation for us was actually being able to scale not off of hundreds or thousands or tens of thousands of communications but off of a fairly small number of communications cuz in a deep learning context where you would actually require a huge content set in order to map to some latent dimension we're not mapping to a latent dimension we're mapping to a known dimension price movement I don't care if the Fed is hawkish or dovish in absolute terms I care if the market is going to react as though the Fed is hawkish or dovish so all I have to do is identify what is the market reaction sounds easy it's not and the central banking context this meant we had to build out a complex model actually twelve models but essentially looking across time horizons and asset classes within each economy and there's some variation there the equities context is much much simpler we're able to control for common quantitative factors that affect price movement so in the context of an individual corporate communication or earnings call in this case what you can do is control for things like did they beat or miss what's guidance what's the industrial sector doing it's a broader market doing what's ROI roa ROIC etc right run your way down that list if you control for those common quantitative factors isolated on the residual price movement unexplained by those factors we're able to then tie the language patterns to that residual price movement the goal being to predict the price movement that is directly attributable to the language itself not to the financials now i'm not going to claim that that's a perfect system we're making a pretty big assumption that there's not some other exotic factor we're not identifying but in aggregate that seems to be the case the data performs extremely well out-of-sample so we're doing something right I did allude to some of the underlying technology that is necessary to make this happen unfortunately that there's a lot of it one of them is our named entity recognition platform so this is that Tim Cook versus from Apple versus cooking apples problem we need to be able to identify not just that Tim Cook is Tim Cook but that Tim Cook Timothy cook Timmy Cook any other variation of those names are all the same people or in this case Tim Apple we had we actually started to have him the bill batted because it's showing up in news articles yeah so I thank you mr. president but uh so the the other complications here and by the way the the way we do any R is fairly complex for those of you who are technically inclined this is a fleet of bi-directional long short-term memory models with an ensemble voting methodology to select on the correct model down to the individual character level in order to be able to resolve spelling errors nicknames abbreviations etc so that actually that alone is a huge amount of technology just to be able to determine are we looking at the right communication then once we have that communication where you can then actually process the language to underpin all this we've had to build a back-end data science platform for those of you who are stuck in the world of having to deploy your own models you're probably you're probably familiar with this problem but spark and Hadoop or kind of a pain to use and you look at the Python based tools like airflow or Luigi and you really run into to some scalability problem so we actually built our own platform to to solve for this for both ease of use and scalability so so alright before so without any further ado on methodology what the hell is the data say right this I pulled this on I think Sunday night so it doesn't have the last couple of days we had to deliver these presentations by Monday so it doesn't have the last couple of days but what you see here is our trend in US Federal Reserve sentiment this is aggregate this is all communications from all speakers the 12 regional bank presidents and every member of the Federal Reserve Board as well as the official press releases statements etc so what you see here pretty straightforward look at that you can see over here nice little drop in sentiment recently what's notable is that sentiment started to fall before they hiked rates in December you can go back here that downtrend that's actually significant so this is direct output from our front end system but what you can also do is then pluck out which speakers do you care about right well what if you only looked at the Chairman the FOMC statement in the minutes trend is almost identical right now so the central tendency is pretty much dominating at the moment and that's not true for all central bank's for instance BOJ right now we're actually seeing a pretty substantial dissent from particular to policymakers so let's jump over to a more concrete example of where we know what happened in policy as a result go back look at 2016 everybody remembers the brexit vote sort of still in the news kind of a little bit so 2016 everybody probably remembers the brexit vote happens and the rest of the world was kind of doing what we all with the rest of the world did after we elected Donald Trump which was saying what happened there well brags a vote same situation ever going what happened well the entire financial world was saying oh my god this is a disaster they're going to cut rates immediately so first week of July a bright spot was in mid-june first week of July you have the you have the governor Carney come out and give a speech this speech was the greatest example of confirmation bias I have seen in years because everybody came in saying brags it's a disaster they're going to cut rates right period that was what they were saying so everybody came into the Senate that speech and said disaster cut that he's gonna cut rates and he's gonna talk about it so moment one he comes out walks on stage and gives an introduction it's talking about here all the challenges we face then he gives the whole middle of a speech which is about here all the things we've already done to combat those challenges concludes the speech with here are the things we may have to do if it gets worse how does every smart busy person read you read the intro you skim the middle and you read the conclusion what were the two most dovish parts of the speech be intro on the conclusion so in addition to confirming everyone's biases you also had the problem of how human beings consume information in order to be efficient so what ended up happening was widespread marking consensus 97 98 percent odds and futures pricing we're saying there was going to be a rate cut right because everybody read the speech the same way now market pricing reversed a little bit by the time there was the policy meeting two weeks later that odds were around 79 80 percent which is still quite high depending on on how you're dealing with futures pricing it's actually very rare for a futures pricing to get above 91 92 percent of anything so the what's interesting here is our data as you see here this is immediately after that speech but before they they ultimately hell rates in July our data was pointing to the fact that yes sentiment was falling but it hadn't fallen as far as it did the last time they cut rates previously that's a really important indicator where was sentiment at how far had did it have to fall before they cut rates previously and what we found there was if sentiment continued to follow the trajectory it was on they were likely to cut rates in August not July that's precisely what happened we were wildly out of consensus on that call but it was demonstration to me it was a demonstration of the value of having a check on our own cognitive bias I fell into that camp personally I fell into that camp of thinking surely they'll cut rates but we had to be faithful to our model in predicting what was gonna happen so we were on the record saying sure enough they're not gonna do it they're likely to sound of ish they're likely to cut rates in the near future but they're not likely to do it yet it was an out its out of consensus call but it was one that I in my opinion demonstrates the value in having this comprehensive unbiased data that is truly looking at everything and without our human problems so jumping over to the to the the corporate sphere a little bit here looking at some sector based information you know this is the tech sector on top and the industrial sector on the bottom this is just looking at all 2018 orange trend here is our sentiment for earnings calls only relative to price which is the blue blue trend in both of these cases you see interesting features so obviously the tech sector nice run up through q2 q3 last year a little bit before that but mostly q2 q3 and then a pretty sizable sell-off right the industrial sector and I will note you know some variation in our data tends to be somewhat leading in that case the industrial sector is probably the more interesting one here for those of you may or may not recall the narrative around industrials last year was entirely about trade right is caterpillar going to be able to sell farm equipment in China right period that was that was the entire narrative what's interesting is that all of a while cute q1 q2 q3 record earnings because ahead of any trade barriers what's happening there have really great great sales in China because all of these all these potential buyers are out there saying well if there's going to be a tariff next year I might as well move forward the need to buy a combine right so they buy now is they don't have to eat and a higher fee later what's really interesting about that is it bolstered their earnings calls but not their corporate communications so I'll come back to that in just a minute but what ended up happening is at the end of the year because of that phenomenon people thought there is I should say the market collectively thought that industrials were overpriced too high so they tumbled further during the during the the sell-off in December now of course they rebounded nicely in January but the what's interesting here is note that sentiment didn't fall it went the other direction because there's more certainty about trade this tells you a story about market momentum more so than actually information about those individual companies more than anything this is a story about indexing this is a story about passive investing and about flight out of assets by investors who or may not maybe may not be making decisions based on fundamental information we're seeing more and more of this and our clients are complaining about this more and more so here a couple of hopefully timely examples the Boeing one Believe It or Not I put in there as the whole story was breaking so forgive me for for not including more information on this yet but uh the the one on the top here is caterpillar so this is going to present the Green Line green trend line here so orange trend line is earnings calls green trend line it's other corporate communications blue line again as price what you're seeing on the top the top graph here as I said caterpillar note as I said before earnings are pretty stable and actually above above average pretty pretty nice couple of quarters here actually eight quarters in a row um what what you see is slow decline in corporate communications speeches by corporate officers press releases things of that nature right and what you're seeing there is that trend is going down for two reasons one those are not as tightly controlled by your investor relations team so they tend to be there tends to be a little bit easier for negative sentiment to leak out but two there was a lot more concern being expressed about tariffs in those communications because they were actually targeting different markets with them sorry I'm sorry about that the the one at the bottom here I assume we've all been familiar with what's going on with Boeing my wife was actually on a maxi last night so she's fine but what's interesting is boeing has had a pretty good run recently right so this is their earnings the blue dot is consensus projections thus the light blue circle is is what the actual is there beating they've been beating earnings pretty consistently for the last several quarters so what's going to be really interesting is to see what happens because we haven't seen people canceling orders in droves yet what we've seen is regulators pushing back we've seen airlines saying we're not going to fly these things but we haven't seen the airline's say we're gonna we're gonna stop buying them so last I checked yesterday Boeing was down six and a half percent or thereabouts I haven't checked what its debt today but I'm guessing it's up there's got to sing hole minutes up they were pre market they were down so that's interesting so it'll be very interesting to see because the fundamentals right now still look solid so this is an interesting indicator that sentiment wise they've been rising fundamentals that they'll have looked good it'll be very interesting to see when those things break apart or if they break apart as a result of this scandal so wanted to key in on sort of a more specific precise example here in this case an actual earnings call apologies for the watermark this output from her system so this was apples q1 earnings call I don't know how many of you might remember that this was an interesting earnings call that had two narratives to it one was they're selling fewer phones core business falling off right especially in China the second one was that their revenue from services was at record levels what was interesting was the initial market reaction was down because everybody saw a fewer iPhone sales the immediate reaction to the reaction was way up services are doing great that's a long-term sustainable business what we were able to do here was key in on what what essentially what was positive what was negative in that earnings call you'll note on the left left side here we scored that just a little bit positive 65th percentile a little bit above zero there was a neutral bound here on the right here you'll see this is a tree map identifying who spoke you'll note some of the speakers are listed twice because we break out the prepared remarks from the Q&A they're fundamentally different types of communication and so we actually break those out in both in both the prepared remarks in the Q&A it's notable that their CEO was actually a little bit negative he's clearly down on the fact that they're not selling the devices right their flagship but their CFO was really happy about the fact that their services revenue if their numbers were actually was actually great this gets reflected in something that we've actually now been able to do which is algorithmically extract the key the key remarks the core comments the things that are driving Lang the language that is driving price movement so you'll see here the very first core comment was about services revenue being up 19% year over a year right so you want to know what's positive here there it is and so the beauty of this is that our hope is that the traditional analyst model the we've all seen sort of fading away over the last few years especially in light of them if it to that analyst can actually now start relying on some of these types of tools rather than having to dial into every earnings call and have to read every transcript the analyst that used to cover ten or twelve stocks is now covering eighteen to twenty and in a few years we'll be covering 30 stocks they simply can't do that anymore so having a tool like this the goal is to be able to highlight for them what's really important do I need to dig further do I need to go read that transcript and we think that you know this is a as I said you know a comprehensive and unbiased way to extract that information in a reliable manner hopefully an easily digestible one but you know we're still working on that one other example I wanted to point to here is the sort of broader thematic research so trade I've alluded this in a number of fronts references to trade in corporate communications has been a big question right how common is it that people are really are talking about trade barriers tariffs things of that nature right and by the way anybody I've seen a few people take pictures of slides anybody who wants access to my slides just shoot me an email and glad to provide them this was a quick study we did there's actually a whole bunch more that goes along with this breaking down by sector how Sencha lis references to trade over time in this case over the last three years what you see is in the orange lines are 2018 not surprisingly there's been a lot more references to trade in the last year than there were in the years prior know what's already elevated in 2017 but despite a lot of I think angst about excuse me about the the president coming in and talking more and more about trade barriers there was some elevation in that discussion during his first year in office but it really didn't ramp up until last year and that's noteworthy here and it's noteworthy in a couple of sectors in particular produced our manufacturing being the huge line there but actually consumer durables various process industries transportation is an interesting one right for a variety of reasons this has actually been a point of some concern because of of questions around as we move towards electric vehicles where are we getting the materials and you did for batteries we've actually done some work looking at this but there's some interesting questions about how do you deal with cross-border trade for the raw materials necessary to build an electric car it's very different than the raw materials necessary to build a conventional car so the I'll take a minute here and walk through some other examples of things that we've worked on that we think are really interesting areas for analysis and then I'd love to field some questions too so as I said you know we've done some attic research using this technology to to delve into what types of language are becoming more prevalent and do they tie to positive or negative sentiment that tariff study we actually were able to tie that to how's that how's that affecting sentiment which it largely isn't the other things that's that's become sort of interested increasingly interesting to us prattle has been really the investor relations world as you might you know not not surprisingly investor relations officers in investment relations officers want to be able to to determine how the market is going to react to what they're crafting well if you have a system where you can input language and see how the markets likely to react to that that sounds pretty appealing what's become interesting to us is it's not just themselves they want to know that about they want to know what about all their competitors too and this is actually a pretty interesting insight into how the IR world works because that's actually what really drives most corporate communication it's not the c-suite it's the IR world the other things that we've worked on recently I know there's been some talk about neither the legal piece and unfortunately I think our previous speaker had to leave but we've actually done some work on activist investor risk and shareholder litigation just in a project that I think really interesting looking at can you disaggregate price movements when one when you have a lousy earnings call coupled with a whole set of subpoenas issued by the SEC right so you have shareholder lawsuit Sue's saying you didn't disclose these subpoenas and the essentially the executives say no no we just had a bad earnings call that's what caused the price drop how do you break those two things apart well it turns out it's all about language and matching those things to the appropriate or the appropriate comps for when there was a similarly bad earnings call or similar subpoena action and so we've been able to disaggregate those things based on language patterns and then various other similarities in the companies the other applications here are really interesting looking at risk more broadly whether that be cybersecurity right identifying common language patterns for companies that have been hacked and therefore predicting which ones are likely to have already been hacked but not disclosed it we're actually able to do that and the other one is directors an officer insurance turns out insurance companies don't like to insure don't like to insure corporations that are using the same language that the Enron executives did who knew and so were able to actually map that out and then of course there's things like corporate perception studies and a variety of other sort of interesting applications here and so I think we're you know barely scratching the surface from a technology perspective of what natural language processing can do so narrowly but also more broadly from the AI applications in finance being able to learn new language as it evolves identify context of that language relative to known entities and some of those things so we've been really I think very pleased with the performance of the system but I think we've really barely scratched the surface in terms of broader research automation opportunities here so with that I'll take questions [Applause] yep when people begin to use a new phrase that your algorithm doesn't yet recognize you say that it learns over time the question is how quick like what's the lag from when we first see Bragg's it until it begins to associate brexit with dovish you know Vav policy yeah so breaks it's a pretty easy one because it's a brand new word right so we can identify the context of it within the first document the harder ones are when you have strings of language that's existed before large-scale asset and purchases we're all words that we use pretty frequently in previous central bank communications but when they were strung together they had a totally different meaning that actually is a little harder so there's a little bit more of a lag to identify context they're pregs it's pretty straightforward really because it's just identifying how does this relate to known entities and we've already keyed in that's a new word and we already have identified sort of what the language around it means but the dealing with those new strings we have to actually disaggregate is this a truly new Stringer's this just the same language we've been seeing before just in a novel construct that may or may not actually have that so it takes there's a little bit more of a lag there typically by the third time it's been used ie third document not the third usage we actually have a very solid measure by the second time we have we have some idea we have not approached a project specifically about ESG although I will note I happen to think yes G is just research right people wouldn't people tended segregate it as its own thing but but nonetheless yeah yes because all we'd have to do is identify a comp to Walmart that's done it's engaged in a similar policy and then identify similar language and then we can walk through likely market response can't get precise but we can get pretty close maybe I've been spending too much time with the investor relations community lately but my view tends to skew towards the longer term is better and ya know not immediately in fact usually the Mart market is wrong immediately what we see is a reaction to the reaction so our core matter measure for corporate communications is actually predicated on 10-day cumulative abnormal return we've actually found that tend a car outperforms 110 so we've tested 110 30 60 and 90 day we've also tested intraday the 10-day model performs the best the 30 day is the second best dramatically outperformed both dramatically outperformed the one-day because the reaction to the reaction tends to be more sustained the other thing we find is that the decay rate if we actually look at a shoulder time horizon the decay rate on the data is rapid whereas if we look at even a medium term time horizon 10 or 30 days the the decay rate is very very slow so the the true sentiment of what's being expressed the true goal of that communication tends to be revealed over a medium term and then actually have longer-term consequences I think that I think that's a reasonable policy I would say be careful on the twenty to forty days we've seen tails on this data that last for several quarters and so you have to be really careful if if you're looking at a time window that's it's localized around a month it the question is what's the table right because the reality is do we really want companies making financial decisions based on what's gonna happen over the next month or even the next three months right there's a not crazy proposal out there to push earnings calls to only twice a year instead of four times a year now that reduces information the market and could have adverse effects on volatility my suspicion is it will actually have positive effects on volatility but there are some interesting questions about that yeah yes thank you for the talk I really enjoyed it back on your slide that breaks down the the trade by sector yeah is that the percent of mentions or is that the sentiment associated with with the trade itself that that is per centum mentions or I should say that is mentions relative to percent of total language is there plans to develop develop that further in terms of like aspect sentiment which you know yeah yeah the full study does some of that I just pulled one of the one of the the graphs here but we actually we actually identified this based on individual sentiment how does increase how does elevated use of trade language affect the sentiment of the sector or the companies largely the effects are basically non-existent okay but but not necessarily sentiment on trade itself like if the companies oh if they're positive or negative yeah on trade it's so yeah what's interesting is by and large they try not to take a view however their language skews a little bit negative but it doesn't have a material effect on on price movement or on say on sentiment okay thank you so you mentioned you have a young proprietary named entity recognizer for finance can you tell us a little bit more about the named entity types is it like a standard named entities or you come up with yeah so NER is kind of a nuanced world unto itself so if you're familiar with it you can think of it sort of as a Stanford 7 class model but we've actually added an eighth class to deal with products and brands also to a great extent people we've also built in a resolution engine that Forks on two fronts one is to reconcile this is oddly a huge problem in financial services but reconciling the Q sip ice in Bloomberg SPE cede all facts at all the different perm IDs is actually a huge problem across data sets so we've actually reconciled across all of them which turned out to be a huge project unto itself and then in addition to that we've also the resolution engine goes in searches for any any mention of any individual entity that isn't in our database we actually can comb the web to find out who that is and how they relate to the known entities we've already got so the short answer is its finance specific but it has sort of some novel applications to make life a little easier some of your competitors use vocal tone and audio files what's your view on that my book my view on vocal tone is pretty straightforward listen to a German CEO and an Italian CEO deliver the exact same message I don't really need to say a whole lot more there but I I will say though we think there's actually real promise in looking at I use it using facial analysis so everybody is everybody's face except it's the same 7 core micro-expressions whereas tone of voice is actually largely cultural you need to build a huge baseline for at the minimum that culture if not that individual speaker to get viable data micro-expressions everybody has basically the same core micro-expressions so there's actually a little more promise in that technology we think we've actually done a couple of studies on this partnering with a company out of Hong Kong working on in particular FOMC press conferences so when Jay Powell gets up it's a press conference we actually analyze that press conference not just based on the content of what's being said but we marry up that content to his micro-expressions he's gotten a lot better in his last couple of press conferences uh he was he had a lot of disgust and contempt on his face in this first couple of press conferences maybe didn't think so highly of the staff initially either that was a little odd there were some really odd ties there but yeah so but we've done a fair amount with with facial micro-expressions tone of voices the the amount of data you need to build a baseline on each person makes it very challenging hi how do you think about tracking changes in what language is relevant to predicting these things and like what kinds of timescales are you looking at data so I can imagine for some of the firm specific stuffs and use comes out you know a few times a year some of the macro stuff you've got like a big financial crisis that's gonna drive your model for a long time unless you've got other ways to think about it that's actually an interesting question right how long does it take language to decay so we we've dealt with this a couple of different ways the first one is the through machine learning which is to say we're identifying how new language relates to known entities but also how existing language is being used in new ways so the system is is adapting and actually improving continuously so it's an online learning methodology that that does take pickup for some of that the more important way we've actually done with this is pretty simple we residual eyes the data a raw output from our system is relative to all prior communications from that entity right so a raw score from an earn firm and earnings call is relative to perhaps a 15-year history of earnings calls from that company right we have data going back 2003 so if if that's the case that their language has changed a lot we've done is actually residual eyes that for a transformation so the data I was showing before it's actually a transform version compared to the last two years now that's a somewhat arbitrary number and I can tell you with complete confidence then at least one of our clients I think several use a different transformation there but that's why I think you know it's a that's a pretty simple back the envelope way to do it but it works pretty well [Applause]

Info

Channel: Columbia Business School

Views: 4,637

Rating: undefined out of 5

Keywords:

Id: 19YLohGBB5A

Channel Id: undefined

Length: 42min 17sec (2537 seconds)

Published: Tue Jun 11 2019