Data Science in Marketing

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- Hi, good morning, everyone. My name is Olivier Toubia. I'm a faculty member in the marketing division at the Business School. I also serve now as the Chair of the Division. It's my pleasure today to welcome everyone and to have a chat with my co-author, now colleague, friend, Shawndra Hill, who is a Principal Scientist at Facebook. She just recently joined Facebook, and she also recently became a part-time senior lecturer at Columbia Business School. And so today we're going to talk about data science in marketing, and Shawndra is a wonderful person to tell us about this. She has a very unique and interesting background. She actually was an electrical engineer by training and then she received her PhD from NYU in information systems. And then she spent a few years at Walton on the faculty there. And after that, she moved to Microsoft Research here in New York city. She spent about five years there. She was a principal researcher in the computational social science group. And then just this, in the last few months, she moved again to Facebook where she's a principal scientist. So welcome Shawndra, thank you for taking the time to speak with everyone this morning. So today you're going to tell us about, you know, your research, your work and the tech industry in general, and data science and marketing. But just before I have a quick question for you. So you were in the computational social science group at Microsoft Research. That's a term that is somewhat novel, so maybe some of the people here have never heard these terms before. So would you mind defining for us what computational social science means? - Sure. So I'm actually in the computational social science part of the org at Facebook too. So I've been in computational social science for a few years now, and as you mentioned, it's a new and burgeoning field, but at the core, we're doing research and really asking social science questions, but using computation and statistics to help distill information from data, to be able to answer those questions. And I think the first time, I could be wrong, but I think the first time that a computational social science group was named, was at Yahoo, probably about 15 years ago now. So I mean, that's kind of how new this area is, and there's now a computational social conference. And the point being that the area and the field is still to some extent being defined. But you can think of folks that are doing research in this area as being, a lot of them, not all, are computer scientists or statisticians, but still care about answering the why things are happening. So like not just predicting outcomes for, you know, sort of marketing for instance, but trying to understand sort of why people or groups, or societies behave the way that they do. - Wonderful. Thank you. So can you then maybe, describe for everyone a little bit your position, what type of work do you do, you know, at this intersection of theory and practice and research and industry. So you know, just what's a typical day for you? What type of projects do you work on? - Oh, sure. So I am what's now, so it's interesting. So I'm now called also a data scientist, and I've been a data scientist probably my whole career, but it wasn't called data science when I started. So in today's terms, I'm a data scientist. And for the most part, I don't have a typical day, except for, I am either working on analysis towards some end, like to solve some business problem, or scoping out and mapping out new problems. So I've been pretty lucky over the past few years in that, I've been able to take on roles where I can still do basic research, and what that means for, you know, people who might not be academics is, I can do academic style research in companies, while still being close to data and real problems. So like the barriers to getting data at least, are broken down a bit. And also, you know getting access to practitioners who have real problems. The nice thing is I get to still work with academics like you, and as you mentioned, I now have an affiliation at Columbia. So it's very much an academic style position but within a company, and, you know, we do have to deliver results in this role, but we take a longer term view of problems just like you would expect an academic to do. - So what's really exciting for us is that also you're going to be able to bring this background into the classroom, starting in the spring. So I know that you're developing a new course. Would you like to maybe say a few words about this course that you're putting together for the spring? - Sure. So first of all, I'm really excited about teaching this course and the spirit of it is going to be learning the necessities for managing data scientists, and data science projects for marketers in particular. And I taught data mining, and sort of applied machine learning for a really long time. So at NYU, when I was a doctoral student, and also at Wharton for many, many years, and frankly because I had so many engagements with industry even then, like, so with different companies, I really thought I was pretty well-versed on sort of how to manage projects and industry. But once I joined industry and I was on the inside, my eyes were wide open that I didn't really know as much as as I thought about like getting things done. So let me just kind of take a step back of what I mean by like I thought I knew. So in class, what we would teach is, you know kind of know what your objective is, know how to measure your success based on, like, if you're thinking about a prediction problem and a classification problem, like the cost of misclassification, like really understand your objective. And then what was missing, I think are a number of things like the fact that, one, it's not just like that you think these projects are cool, like you really have to do a sales job, and so your job becomes not really that of a scientist but you have to wear multiple hats, because usually you don't have a huge team. So you have to know how to program manage a little bit. You have to know how to sell your results and communicate them in a way that a lay person can understand them. You need to know how to build prototypes along the way. So most of the time in my experience, you know, an idea is not really worth much. Like you have to show people what you're telling them is going to work. And so you have to know how to prototype things in a way that doesn't suck up your time on projects that, you know, sort may not go to the end. So really thinking about relationship building and selling is one aspect. And then another, maybe more important aspect, is that, even if you have an amazing idea and leadership in your organization, whether it be big or small, wants to buy in, there might be legal policy, privacy issues that you can't overcome with your project, and knowing how to sort of navigate the review process, not only for the things that are, let's just say illegal, but also for the things that might be in bad taste at the moment. So there are a lot of things we can do but that maybe we shouldn't do, if we're checking our moral compass. So knowing how to navigate those things. So this did kind of come up in sort of working with companies before joining industry, but there's always this tension in my experience working in industry now, of like doing something right, versus doing something fast. And so as someone and, you know, my students I would teach them that they want to do it right not just fast, so you have to really manage the expectations of your major stakeholders up front, in terms of like what you're willing to bend on, because almost always, and this may be surprising, but you're going to have to bend on something. So it's like, and then how you explain that. So it's like, is this heuristic for what you're trying to do, or is this something that is just an association, and we can't make any causal claims or however it is that you need to explain it, you almost need a contract, because people that are trying to deliver to customers, or you know, internally, care about meeting a deadline oftentimes. And so you have to figure out how to navigate that. And then I think, you know, thinking about impact more broadly is, I think the number one thing that maybe I was just lucky in the projects that I picked in the past, but, you know if you want to have impact on a company, it needs to be something that, you know, if you're talking about users, will impact a large number of users, or at least impact the bottom line, or it's something that is so extremely important that the answer, that leadership believes like that they should invest, like just having an idea, and that it would be cool to do or connect data. Like people don't really care about those ideas. So you have to really kind of be able to scope these things out for yourself in advance, so you don't waste time on things that people won't support. So those are the types of things that we'll talk about in class towards making sure, if you want to take on a role, either managing data scientist or being a data scientist yourself in an industry, that you kind of almost have a checklist of things that you need to go through when you're thinking through working on projects if they already exist, or scoping out new projects that you have to sell and position within the company for success. - Thank you, Shawndra. So this was, I think, a wonderful, like, you know high-level introduction to your work and your world. Before we really got into more specifics, a couple of housekeeping items, I got message that some people have some issues with the audio. So I tried to mute myself when you were talking this time, but I don't know if the audio is getting better, so if participants are getting audio issues, please feel free to... Okay, sounds great. Okay. So apparently things are getting better. So I will just mute myself when you speak to make sure there's no echo. - Can you hear me okay, when I'm talking? - I think it's fine. Yes. I think it's fine. Maybe I know somehow there was some issue earlier. So another issue is that, of course, all of the participants who feel free to enter questions into the Q and A, and you know, we'll try to address as many of them as possible. So with that now, let's, you know, Shawndra and we try to drill that a little bit deeper, like more, specifically more concretely into your work. So I know that you prepared a few slides about your research and your work. So I'm going to give you a chance maybe to give some examples, to maybe give people a bit more tangible example of your work. So I'm gonna let you share that if you would like. - Great. Thanks. So just for the audience, I prepared a few slides just to kind of walk you through a few examples. They're not meant to be comprehensive, but I think they'll help you follow the discussion that we're going to have. So-- - And then we'll go back to a more higher level after that, the more, like in general, how data science is impacting marketing. But I think it's good now to have some specifics to set the ideas a little bit. So go ahead, Shawndra, thank you. - Can you see my screen? - I can see your screen. I think everyone can. Thank you. - Great. So for most of my career, whether it be in marketing or in other areas where I've applied data science techniques, I've really tried to connect data oftentimes for the first time, to answer marketing problems in this context, or sometimes use data in a way that people haven't used it before. So if you could go back in your head to 2004, back in 2004, it was the first time I actually got excited about marketing problems. Like as Olivier mentioned, I got my PhD in Information Systems. I actually applied thinking I was going to predict the stock market. I don't know what I was thinking, but anyway, I quickly got interested in marketing problems, and here's the reason why. I was doing an internship where I got access to social network data. So connections between people, where those connections were phone calls. So on this slide, it's a network where the nodes on these networks are supposed to represent people, and the connections between them are phone calls, but, you know, the idea is that birds of a feather flock together on these networks, right? And so these connections in absence of having information on things like race, religion, age, gender, geography, can be used as a proxy for knowing that people are in some way similar. And so we used this idea in marketing, and the way that it worked was, we took existing customers of a service, and we then looked at all of their friends, and asked whether these connections could tell us something about their likelihood of adopting that service. And it turned out that they were five times more likely to purchase this particular product that we were advertising, than people both selected at random, and even after doing a propensity matching, they were more likely to purchase the product. So believe it or not, this was maybe the first work that connected social network data to real business outcomes for marketing. So again, remember, this is like 2004, pre everybody being on Facebook, Facebook existed, but there wasn't a lot of data like this. So it turned out that companies got wind of this idea. And there were startups that were even started up on this basis of, you know, connections between people having some value for predicting attributes of users. And so a lot of people ran with this idea after this work, maybe they were thinking about it in parallel, but certainly these companies weren't growing kind of before this work came out. And then since then, you know, sort of the rest is history, like social networks are used to predict a lot of things, about people for various reasons. And so one thing that always bothered me about companies just kind of running with this idea is, we showed that the connections in the telecom setting mattered for this one product, right? So it, at the time it was voice over IP. Like people paid for it back then, right? Now we all get voiceover IP for free, but back then, they paid for it and it was this one product, and it worked very well, and there could have been reasons why that was the case. You know, like maybe when people were talking to their friends, they were talking about it, like, hey, do you hear my new, you know, phone quality? Or it could have just been this homophily idea or something else, but it really bothered me that like people ran without testing. So what I did was I collected a lot of social network data. So this data weren't perfect in the sense that I collected it externally using the Twitter API, and Facebook APIs, and getting as much data as I could, but what I did was I took TV show handles, and brand and product handles on Twitter. I got all of their connections. So all of the people that followed them, and then all of the people that followed them. So I got the followers of the brands and TV shows, then the followers of the followers, and for all of those followers, I got their tweets over time. So it was a lot of, you know, pinging this this Twitter API. And what I was able to do with that data is pretty much build a pseudo recommendation engine, where I say, okay, like a person or a Twitter user followed Coca-Cola say, let me take that as information for my recommendation model, and then predict what other brands they would follow. And I did that based on the social network, I did that based on text features, and I did that based on more of a traditional sense of people being similar, based on the products that they either purchase or have in common. And so this image, let me just explain it to you. It's a little complicated because I don't have the whole top. But on the horizontal axis, so we built these recommendation engines. It's the number of recommendations that I was making to these users. And then on the vertical access, think of it as the difference between the performance of the social network based recommendations, and the product base, like the traditional basing this on products. And so it turns out that for some verticals, the product network consistently did better, and for other verticals, the social network did better. So for things like children's products, home products, media entertainment, sports, where you would expect people to have, expect these advertisers or products to have a niche audience, like where homophily would actually matter, then the social network actually did better. And then for cases where you would expect everybody to follow these products, things like household products, or health related things, the product network did better. So this was a first step at trying to understand like when social network data would actually outperform at least in this setting, product network. And so we also collected text data and built recommendation engines based on texts, where people were similar based on the words that they used in their tweets. And we were able to characterize the audiences of these brands, based on the tweets that their followers say. And so we could connect those copra of tweets to the attributes of the characteristics of people who followed these brands. We got this from a different data source. And we could figure out like which words were actually predictive of certain characteristics. So these are word clouds that show, on the left side, words that products that have stronger female audience say, versus the types of things that are said in the tweets for audiences of products that skew male. And so we did that for a bunch of categories, and we cross tab these categories, and we can get these nice sort of word associations with certain audience types. And so from a marketing perspective, like this could help brands and products, and advertisers, understand who their audiences are a little bit better. I will say, with all of this work, the difference I think in how academics approach the work and maybe how it's traditionally approached in product group is this, right? So a lot of times, there are these overall high level metrics or KPIs, that a company needs to report on, or a team needs to report on for a particular solution. And very rarely in my experience, do people try to figure out if there's any heterogeneity by user or by brand type, when you're talking about big systems like recommendation engines, that need to make predictions broadly. And so how I think about my role even now, is sort of digging into the details, to try to figure out where heterogeneity lies. So that was kind of... Using social media data is something I did for quite a while, and it was exciting because it was new data at the time. And then I joined Microsoft, and I focused a lot on sponsored search data, and sponsored search data were exciting, because it was new to me anyway, but also because you can do a lot with it, learning about the customer journey, like, so for any given customer, how they search over time for a particular category of product, or for a specific product. Once you're sort of embedded, you have access to a lot of information about users, anonymized, obviously, but you can see for instance, whether somebody already owns a product or not, and look at how they respond to advertising, whether they are an existing customer or not. So this slide is just showing sort of kind of an experiment, how it worked, and I'll explain a couple of ways we used it. So basically, people search for brands on search engines. So here there's a search for Edmunds. And usually when it's a product or brand you'll see first a set of advertisements, followed then by organic search data. And so the experiment that was running, is that the advertisements were getting shuffled randomly above a certain threshold. And they were either showing one, well, zero, one, two three or four ads. And so what that enabled us to do, is to see what happens when, for instance, in this case in step two, so this is just showing that sometimes in the experiment, no advertisements are at the top, right? So this would let us answer, so what happens if a customer doesn't advertise on sponsored search? Like how much traffic do they still get from their organic link? For instance. We could also ask questions around what happens, like, because these things are now being shuffled, the advertisements, what happens when a competitor is shown above the focal brand, or a complimentary product, for instance, right? Because normally, what would happen is the brand that is being searched for, will show up at the top. So if you're studying without an experiment, you'd only know sort of what happens when it's at the top or nothing. But because these things were being shuffled, we can ask interesting questions about competitors, and complimentary products, and also, as I mentioned, one of the things that we did was see whether these ordering effects that we wanted to study, like how much traffic stealing there is, as the focal brand moves down the page. Whether that is stronger or weaker when a person already owns the product. So these data, again, were maybe in this case, it wasn't new, like these data were sitting around, but asking questions in new ways. And this is the work that I'm referencing actually, is with another faculty member, in the marketing department, Andre Seminar. So again, like partnership with academia. So the search data was exciting to me for another reason. I was really interested in TV research, specifically like TV advertising research. And when I joined Microsoft, it wasn't a TV ad company. However, one thing that we could look at is, how TV ads interact with sponsored search ads. And then it became interesting to both Bing and advertisers who advertised across TV channels and sponsored search channels. So the general idea is that people are sitting in front of their TV, and they are responding to what they see online, or on some device that's not their TV. And so it did a lot of work looking at how people respond, who responds and why, trying to get at why. So the general ideas, people see something on TV, they go to a search engine like Bing, they search for it, they get the sponsored search results, and then they click on something, right? And maybe they're looking for something related specifically to what was shown in the commercial, or maybe they're looking for the brand more broadly, but those are things that we could try to understand with this data. Historically, and in most of the work that I've done, we treated TV ads as events. So they had a specific time, in a specific place. Usually, we looked at ads that were shown nationally in the US, and we can measure the search spikes immediately after a TV ad. So this is a plot from real data, where the surface laptop in 2017 was aired. And so we'd see this orange line, where the ad was aired, and then the search spikes after. And we weren't the first to show that there are search spikes, but what we tried to understand, because we had access to more data is who these people are, with respect to their demographics and which types of devices they were searching from. And even if there are differences in how the attention shifted, with respect to the user characteristics. So another study that we did, so this is like a lot to digest, I know, in one slide, but it's a very exciting project, and one that I was really excited about, where we moved from this, right? So before and most prior work, not all, really treated TV ads as events, where you have an ad showed at 8:00 PM, and you look what happens right in the minutes after. And the reason why you couldn't really look long-term for any one advertisement campaign, is because once you look term, they're just so many confounders that come into play, including that the same ad might be shown a few minutes later. And so that's when you're looking at aggregate level data. So what happens in aggregate, all the people in the US, and what happens after 8:00 PM on search, and what happens for, you know, just knowing that a particular ad was shown at 8:00 PM in the US. What we tried to do more recently, was connect users to their TV viewership in a privacy friendly way, at the individual household level. So then we could say, we know that at least, this box was tuned in to a TV commercial, what happened for them? So that enables us to do two things at least. One, is look more longer-term at what happens to people who see TV ads, with respect to like their search behavior. And it also enables us to look at advertisers who advertise all the time. So for those of you who are marketers, or, you know, work on advertising campaigns, you might know this, but there's some advertisers who like any minute of the day, are always on. So for telecom, for things like food and beverage, at certain times of the day, you couldn't even do this event type analysis. So it enables us to be able to look at more advertisers and measurements. A project that I worked on with Olivier is related, but different, where instead of looking at TV ads, we looked at TV show events, and we looked at how people searched for information about TV shows over the course of, for the most part a day, but like before and after shows. And what we wanted to understand is those dynamics, but also could we jointly model the search behavior and the click action, so what people click on, in such a way that would help us to do better at predicting, one, what people would likely want to see, and also as a result, want to click on. So this is just showing over time, interest in the Super Bowl in 2016, so you see this huge spike, when the Super Bowl starts. It stays high, the interest, while the Super Bowl is on. But what's interesting is, if we looked at what people were searching for over the course of 24 hours before during the game, and 24 hours after, we saw a lot of the same searches, so these are some of the top searches, but what you see as indicated by these colors, is that some were searched more before the Super Bowl was on and some were searched more after. So we wanted to look at both, these dynamics of what people were interested in, with respect to their searches, but what might not be obvious to you, is not only were people searching differently for these topics, but even for something like the Super Bowl, they were clicking on different things, right? So they were searching for Super Bowl, but maybe before the ad, sorry, before the TV show aired or the Super Bowl aired, they were looking at, you know, what the time was or when it was on, versus after they're looking who won the MVP. So they could have searched for that specifically, but it was also those types of things were also reflected in the clicks, even for a generic term like Super Bowl. So the way that we modeled it was using these dynamics in the clicks, as well as topic modeling, topic modeling the snippets, and the text of the queries, in such a way that enabled us to do a better job at prediction than if we were to not factor in these dynamics and the end search. So that was a really fun and exciting project. And so finally, I'm going to end really briefly on something that I worked on. The motivation was to understand whether showing diverse characters in TV ads, led to better outcomes for businesses. So basically what we were finding and we know this to be true, is that, more and more advertisers were including characters in their ads, that reflected the diversity of the people in the United States. And so what we wanted to understand is like, you know, is that good for business basically, right? It seems like it was good from a social perspective and the right thing to do, but we want to know if it was good for business. So when we started, we didn't know how to get at these answers of which advertisers were more inclusive in their advertising, versus not because that data didn't exist. So we had to create it. So the way that we did it, was we used off the shelf tools for video extraction and image labeling. So we took a video, we extracted information using this video indexer that's available by Microsoft. We then let the tool automatically label the age, gender and not race, it doesn't label race anymore of the users. And also we did a duplication of the actors that were in the show. So like, you know, somebody might turn to the side, and then the image would show up twice in our data set for a particular advertiser. So basically after we extracted all the images, in addition to automatically labeling them, we ran a study on Amazon Mechanical Turk, where we had people label the images based on what they thought was, who they thought was in the image with respect to their gender, age and race. And we ended up with about 6,000 videos, and a bunch of different labels. We asked for every image for two labels. So what was interesting is that for gender, people were able to label the characters pretty clearly in the sense that the raters, the two raters agreed most of the time, but when we asked them about race, only 69% of them agreed on white, 61% on blacks, 10% on Hispanics, and 15% on Asians. So that's interesting just as an aside, because these models are trying to do a better job of labeling race, but even humans can't do a good job. And that's something that has to be considered when thinking about these models, and actually putting them out into the world. So we built a toolkit and I want to be sensitive to time, so I'm just going to say very quickly, on top of this data, we basically came up with a set of inclusivity scores for an advertiser, for the vertical, for the industry, and using these scores, we can measure the diversity in these different groups, and plot them in such a way that an advertiser could go in, and ask how they are comparing against other advertisers in their cohort, whether that be at the product type level or at the industry level. So this was the first step, and I didn't get to connecting this to business outcomes, but that would be the next step, but still, it's an example of using computational tools to understand what's going on in marketing. So I will, and there like, there's some obvious things just as a, you know, a couple of points in terms of the results. Like we saw things like women were more likely to show up in retail stores and health and beauty, but less likely in electronics and communication and vehicles. Blacks were shown more for political, government organizations and education, but less so in home and real estate. And seniors were shown more in insurance and pharma, and less for apparel and footwear. So there are some face validity here, you know, to at least make us believe we were going in the right direction for these labels. But again, the connections to business outcomes would be the next step. So I think that really is it. So I'm going to stop sharing my screen, and hopefully the slides helped. I kind of pulled out like pictures that I think would make the points, but yeah, these are examples of using computation in marketing. - Thank you, Shawndra. Thank you, this is a lot of information, very exciting work. And there's a few questions in the chat. Some of them are technical, some of them are big picture, so maybe we can go through them if you don't mind. So you talked about how you do propensity matching in the social network. Someone wants to know whether you used the neighbors to do that. I'm guessing it's more for logistic regression probably, or... - Yeah. So at the time, it wasn't widely accepted, and even now, like it's been criticized to use this for marketing problems, but at the time we use logistic regression. - Now, someone also, some pretty technical questions, in your job, how much of the time you spend building coding the prototypes? - That's a great question. So in my prior role at Microsoft, I did not spend as much time as I do now coding. So I think, I'm saying that to say that the answer is, I can't make a general answer, because I think each team is different and it depends on the resources. So I was lucky in my prior role to have engineers that worked on our team that could support research and building prototypes, and I don't have that now. - Thank you. Now we go to a much more deep, or maybe like philosophical question, which is how do you balance ethics and outputs, you know, in the time when companies in the tech world are under scrutiny for using analytics? - Yeah. So first of all, that's a great question. And so let me just tell you, like what happened over the course of working on the things that I showed you. So back in 2003, 2004, it was kind of like the wild, wild West. Like there were no rules really, you know, and the rule of thumb then, was to talk to lawyers, to make sure you're not doing anything that is violating any laws. Companies and researchers, and, you know like even academics, and the IRBs, like understanding this data a little bit more, like we've all evolved, and I rely heavily on whatever processes are in place in the organizations that I belong to, because they think a lot deeper about the implications for policy, for privacy and for ethics. And so I rely on them, and you know, I don't ever want to do anything that violates anyone's privacy. And so I think about it often, and I will say that, because of how things have evolved, there are a lot of changes in what you can and can't do for data in companies. And a lot of it is policy, right? It's like the agreements that these companies have made with their users and their consumers, and less so about what's allowed by law. And so your question is spot on, and I think times are different now, and some of the things that I showed you, maybe couldn't be done now. - Thank you, Shawndra. So we're going back to a much more technical question. Someone wants to know the format of the data that you work with, is it CSV, Excel, XML, Jason? - Yeah. It depends. So... these days, it's rarely any of those, it's usually in some kind of data store, whether that be a traditional SQL database, or something that, you know, can handle much more scale. Historically, like with the social media data, we would get it from the APIs in Jason, and it was easy to process that way, but I would say a mix, but the data are so large now, it's like really in a flat file. - Thank you. Now we're going back to, you know, a question that's more I guess higher-level, so, you know, you've showed some needs of experiments, observational research, you know, more correlational causation. So someone wants to know how you think about using observational methods, which is experimental methods, in the field or in market applications versus an academic research. So it's the-- - I think that's a great question. I mean, maybe this is, Olivier is the expert on the academic research side, but like I can tell you, like, in terms of like what can get published. So I think it's really hard to get work published, if you can't make causal claims, whatever method you're using. Whether it be like observational methods or running experiments. - Especially in marketing, in social science as you said, there's a big emphasis on the mechanism and the why, so there's still research that's more predictive, trying to predict or optimize, but it's true that there's definitely a big taste for being able to understand what's driving the results for sure. - Yeah. So it's really difficult to get things published in top journals, if you can't make causal claims. In practice, it's better. Like people get more excited about results where you can show causality. However, oftentimes prediction is a fine answer. As long as it's repeatable and reliable over time, like whatever you're doing. So those are two things. And then the final thing is, there are just some things where you can't run an experiment. Like you just can't. So it's like, even like, so there are people who have run experiments on social networks, for example, like gifting certain products and like seeing how it shares, but you can't like make somebody make new friends. I mean it's really hard. So like their context where you have a wealth of data on user behavior, and you want to answer a question that's really important, but an experiment just isn't possible. And I'll add something, actually, if you, so we didn't talk about surveys maybe cause like that doesn't come up in the causal context. But the other thing is sometimes if you take search, for instance, can like reveal a lot about what people are thinking or doing, that they might not reveal. And not because it's like private or sensitive data, just because like, if you ask me for instance, what I think about the political candidate that I voted for, maybe because of political cheerleading, I will say something positive, even if I don't believe it. But so my behavior might tell you more about like what I think about the candidate, if that makes sense. So I think there's still a lot of value in using observational techniques and data that is generated without experiments. - Thank you. That's very interesting. There's a couple of questions that I think in some holding back to your course a little bit. So someone asks whether in these examples that you shared with us, was the person leading the projects knowledgeable about the technicalities of getting such research done? And then, you know, maybe rarely someone else asks, you know, how do you relate the core of that to your experience at Facebook and Microsoft? And so, yeah, so it'd be the, yeah, I think getting these types of products done and the management of data science projects. - Yeah. So I think it's going to be directly related to answer the second question first. So it's based on like learning how to do this well now, after also, you know doing things in a more of an ad hoc way, because like I wasn't required to be thoughtful about timelines. I mean, maybe I should have been as an academic, I don't know. I mean I'm saying this like, but it's different. So I was forced to get better at this. And also I learned from teams that deliver over and over again really well. So yes, it will be based on my experience, but also based on, you know, sort of known ways to lead a successful projects. So it's not just my opinion, it's gonna also be based on project management skills and... And solutions for that. As far as like project, I don't know exactly which project the first question is asking about-- - I guess, I think maybe in general, I think the type of product that you described, does the person leading the project need to be knowledgeable about the methodology and the technicalities behind the research? Are these products being managed, maybe by people on the management side? - Is the question more like, can you be a data science manager if you're not like technical? - I don't know. The question is was the person leading the projects knowledgeable about the technicalities of getting search research done? - Yeah. So it depends on who you're talking about. So like, if you're talking about somebody who is a technologist, so of course, like they're going to be experts in their field. Like, that's pretty much what they're hired for, like as data scientists and they'll be good scientists. But that doesn't mean that you'll be able to ask the questions right. And again, like teaching students how to ask the questions right, is something that was part of the technical version of the class. This is altogether different, it's not just like, are you asking the question right so that you can get an answer? It's also like, who will care? Is your timeline aligned with, you know the team that you're working with who are your stakeholders? And like, are they going to support you? And support comes from any number of things. Like it could be engineering support, it could be PM support, it could be sales support. And so let me just answer the question just in case that was wrapped in it of like, do you have to have a technical background? I would say like, you need to understand at a high level, like what's happening, but there's nothing more valuable than a great PM at getting to the bottom line of what a project should be doing, what their timelines are and who what they're delivering. So I've seen, you know, amazing PMs that were not computer scientists, that really can lead in technical spaces. So you don't have to be, but you have to be willing to learn. Like, cause you do have to understand what's going on. - Yeah. Thank you, Shawndra. So I think there's a question that's specifically about the project on the search on TV ads. Was there any insight into who specifically tended to search at first seeing the TV ads? For example, were the consumers already likely to have interest in the product, prior to seeing the TV ad or the consumers who mostly just became aware of the product in that moment? I guess maybe some attribution maybe issues there. - Yeah. So in one of the studies that we did, it does look like it's newer people that are searching in those moments after the TV ad. I will say on a related study of just to connect it to like things that I do know for sure, without TV ads, we looked at how people search for products and brands over time. And it turned out that, at least for a couple of technology products that we looked at, the large majority of people searching after the product was launched for a few months. So in the beginning like people are trying to get information about the new product, but after, it was largely people who already owned the product. And therefore, you know, it's not clear that those are the people that you want to advertise to. They certainly, well advertise the same product. They responded different to advertisement. However, they were much more likely to respond positively to complimentary products. So knowing that information to the point of the question asker is like really, really important. And it's hard to get, like it's not easy to get that data in these contexts. - Yeah. Thank you. I think, you know, back to the privacy issues, there's a question, how can this function related to kids' products surveys, and the whole protection of privacy laws, I guess if you're trying to study product that are target to kids, for example, are there restrictions, is it possible to do, or maybe there's a... - So there are restrictions, I've never touched that. Like, as you know, like, so I don't exactly know. There's somebody at Kellogg who studies marketing to kids on the academic side, but I don't know the answer to that. I do know that these companies spend a lot of effort trying to identify who's a kid, so that they do not market to them. But as far as like doing studies on them, my guess is that it's like a big no-no, like that's my guess, but I don't know. - Thank you. So I think going back to the, getting the job done, so someone asks what challenges you have, and/or what tools you need to do your job better? - So... so two things usually, bandwidth, right? Because... you just go further faster when more people are working on projects. And I would say the tools, usually it comes down to more like data than, you know, sort of having more computational power or something like that. It's like you're missing some piece of the data that would enable you to do things like understand like who these searchers are, or something to get out the why. A lot of times you can find a connection between two things, even if you run an experiment, but the why part is oftentimes like really hard to get at, because you're missing pieces of the puzzle. So I don't know if that answers the question, because it's not a tool question, but it's usually like limitations in the data. - Thank you. So we have only a couple of minutes left, so we touched a little bit, you touched on social networks, search advertising, TV ads, so are there any other areas of marketing that, in which you see data science having an impact, that you'd like to briefly maybe mention or, are these the three main buckets in which you've seen most of the action? - Yeah. So I'll say one that didn't come up but that maybe I'll work on, is like customer journey. I think the data today, enable you to learn a lot more about customer journey, and then maybe looking to the future, like I'm not working in this space yet, but it's exciting, things like product placement, and even a product placement in video games and virtual reality games. Like I think, you know, that's just a whole nother space to explore where technology and computational methods will play or play a role, but pretty much like anything and everything, you know, like we also didn't talk really about like brand lift, or like really traditional things like customer lifetime value, like all of those areas. Like when you have more data on consumers, like you can do more to understand what's going on. - Yeah. So yes. I think, you know, maybe final word. I know one topic we haven't talked about, in which one thing that you care about is, the diversity in tech. And I know just, you know, what do you see being done and what more be done to improve diversity in the tech industry? You know, we have a lot of students here, maybe your younger lambs in the panel, any advice for both job candidates and also for managers and recruiters on that front? - Oh no, I have one minute to say all that? (both laughing) Well, first of all, let me say it like, anything I say is my opinion, because I'm not like a DNI, diversity and inclusion expert. And also like, even when we talk about diversity like that can mean a lot of things, right? Like there are all kinds of dimensions where people may not be well-represented in groups. But I will say in terms of what companies do, I mean at least they're talking about it more, in light of like a lot of things that have happened in this country. I mean, just a few things that I think companies can do better, and some of them are already doing this, is like move these roles to be like more central in the company, so that they have a little bit more power, and make sure, like for the companies who don't have diversity and inclusion roles, like have them, you know, start to build them. Like sometimes in smaller companies, like the burden falls on people that are in underrepresented groups to like make suggestions, or even get involved to run programs. So, you know, the first thing is like just make sure there's somebody for whom it's their job, and who are experts to think about, you know, sort of how to improve things. The other thing, like if we're talking about black women in particular, part of the problem, I think, maybe even the biggest problem is like the lack of data because of the small numbers. So like you always have a small end problem. This shows up in academia too. It's like, you never know how they're feeling because like, if you survey two people, you can't report usually for privacy reasons like what the answers are for those two people. And so that makes it really hard to do things. So the obvious things like recruiting differently, you know, sort of training. Some things that we could do, is make sure that people want technical roles that they're prepared for technical interviews, because that's oftentimes like the barrier to entry. And then, you know, for all of these, technology or otherwise, like they just have to change the culture. So that it's welcoming but advice is, you know, do it, there's space. Like one of the things, and I don't want to trivialize, like, you know, people's experiences, because they're valid, and obviously like if you're in a minority group, it's hard sometimes, but I think also in technology, at least in my experience it's like, you're rewarded for being good at what you do. And so it's like just work really hard to be really good, and build networks that can help you navigate, those times when it's, you know, maybe not as friendly as you would like, but there's space for everybody. Like there's really space for everybody who wants to participate in data science in particular. - Thank you. That's a great note to end. We're already past 10 o'clock, but thank you very much Shawndra. We're getting lots of compliments on the chat, people want to hear more from you. So, this is just the beginning. So have a good day everyone and a good weekend after that, and a happy Thanksgiving also, while we're there. And so thank you again, Shawndra very much, and everyone. - Okay, bye. - Take care. Bye.

Info

Channel: Columbia Business School

Views: 3,611

Rating: 4.9183674 out of 5

Keywords:

Id: o1bSIp65ThE

Channel Id: undefined

Length: 60min 3sec (3603 seconds)

Published: Mon Nov 30 2020