- Hi, good morning, everyone. My name is Olivier Toubia. I'm a faculty member in
the marketing division at the Business School. I also serve now as the
Chair of the Division. It's my pleasure today to welcome everyone and to have a chat with my co-author, now colleague, friend, Shawndra Hill, who is a
Principal Scientist at Facebook. She just recently joined Facebook, and she also recently became
a part-time senior lecturer at Columbia Business School. And so today we're going
to talk about data science in marketing, and Shawndra
is a wonderful person to tell us about this. She has a very unique and
interesting background. She actually was an electrical
engineer by training and then she received her PhD from NYU in information systems. And then she spent a few years at Walton on the faculty there. And after that, she moved to Microsoft Research
here in New York city. She spent about five years there. She was a principal researcher in the computational social science group. And then just this, in
the last few months, she moved again to Facebook where she's a principal scientist. So welcome Shawndra, thank you for taking the
time to speak with everyone this morning. So today you're going to
tell us about, you know, your research, your work
and the tech industry in general, and data
science and marketing. But just before I have a
quick question for you. So you were in the computational
social science group at Microsoft Research. That's a term that is somewhat novel, so maybe some of the people here have never heard these terms before. So would you mind defining for us what computational social science means? - Sure. So I'm actually in the
computational social science part of the org at Facebook too. So I've been in
computational social science for a few years now, and as you mentioned, it's a new and burgeoning field, but at the core, we're doing research and really asking social
science questions, but using computation and statistics to help distill information from data, to be able to answer those questions. And I think the first
time, I could be wrong, but I think the first time that a computational social
science group was named, was at Yahoo, probably about 15 years ago now. So I mean, that's kind
of how new this area is, and there's now a computational
social conference. And the point being that
the area and the field is still to some extent being defined. But you can think of folks
that are doing research in this area as being, a lot of them, not all, are computer scientists or statisticians, but still care about answering
the why things are happening. So like not just predicting
outcomes for, you know, sort of marketing for instance, but trying to understand
sort of why people or groups, or societies behave the way that they do. - Wonderful. Thank you. So can you then maybe, describe for everyone a
little bit your position, what type of work do you do, you know, at this intersection
of theory and practice and research and industry. So you know, just what's
a typical day for you? What type of projects do you work on? - Oh, sure. So I am what's now, so it's interesting. So I'm now called also a data scientist, and I've been a data scientist
probably my whole career, but it wasn't called data
science when I started. So in today's terms, I'm a data scientist. And for the most part, I
don't have a typical day, except for, I am either
working on analysis towards some end, like to
solve some business problem, or scoping out and
mapping out new problems. So I've been pretty lucky
over the past few years in that, I've been able to take on roles where I can still do basic research, and what that means for, you know, people who might not be academics is, I can do academic style
research in companies, while still being close
to data and real problems. So like the barriers to
getting data at least, are broken down a bit. And also, you know getting access to practitioners
who have real problems. The nice thing is I get to
still work with academics like you, and as you mentioned, I now have an affiliation at Columbia. So it's very much an
academic style position but within a company, and, you know, we do have to deliver results in this role, but we take a longer term view of problems just like you
would expect an academic to do. - So what's really exciting
for us is that also you're going to be able
to bring this background into the classroom,
starting in the spring. So I know that you're
developing a new course. Would you like to maybe say a few words about this course that
you're putting together for the spring? - Sure. So first of all,
I'm really excited about teaching this course and the spirit of it is going to be learning the necessities for
managing data scientists, and data science projects
for marketers in particular. And I taught data mining, and sort of applied machine
learning for a really long time. So at NYU, when I was a doctoral student, and also at Wharton for many, many years, and frankly because I
had so many engagements with industry even then, like, so with different companies, I really thought I was pretty well-versed on sort of how to manage
projects and industry. But once I joined industry
and I was on the inside, my eyes were wide open
that I didn't really know as much as as I thought about
like getting things done. So let me just kind of take a step back of what I mean by like I thought I knew. So in class, what we
would teach is, you know kind of know what your objective is, know how to measure your success based on, like, if you're thinking about a prediction problem
and a classification problem, like the cost of misclassification, like really understand your objective. And then what was missing, I think are a number of
things like the fact that, one, it's not just like that you think these projects are cool, like you really have to do a sales job, and so your job becomes not
really that of a scientist but you have to wear multiple hats, because usually you
don't have a huge team. So you have to know how to
program manage a little bit. You have to know how to sell your results and communicate them in
a way that a lay person can understand them. You need to know how to build
prototypes along the way. So most of the time in my experience, you know, an idea is
not really worth much. Like you have to show people what you're telling them is going to work. And so you have to know
how to prototype things in a way that doesn't suck up your time on projects that, you know, sort may not go to the end. So really thinking about
relationship building and selling is one aspect. And then another, maybe
more important aspect, is that, even if you have an amazing idea and leadership in your organization, whether it be big or
small, wants to buy in, there might be legal
policy, privacy issues that you can't overcome with your project, and knowing how to sort of
navigate the review process, not only for the things that
are, let's just say illegal, but also for the things
that might be in bad taste at the moment. So there are a lot of things we can do but that maybe we shouldn't do, if we're checking our moral compass. So knowing how to navigate those things. So this did kind of come up in sort of working with companies
before joining industry, but there's always this
tension in my experience working in industry now, of
like doing something right, versus doing something fast. And so as someone and,
you know, my students I would teach them that
they want to do it right not just fast, so you have to really
manage the expectations of your major stakeholders up front, in terms of like what
you're willing to bend on, because almost always, and
this may be surprising, but you're going to have
to bend on something. So it's like, and then
how you explain that. So it's like, is this heuristic for what you're trying to do, or is this something that
is just an association, and we can't make any causal
claims or however it is that you need to explain it, you almost need a contract, because people that are trying
to deliver to customers, or you know, internally, care about meeting a deadline oftentimes. And so you have to figure
out how to navigate that. And then I think, you know, thinking about impact more broadly is, I think the number one thing
that maybe I was just lucky in the projects that I picked in the past, but, you know if you want
to have impact on a company, it needs to be something that, you know, if you're talking about users,
will impact a large number of users, or at least
impact the bottom line, or it's something that
is so extremely important that the answer, that leadership believes
like that they should invest, like just having an idea, and that it would be cool
to do or connect data. Like people don't really
care about those ideas. So you have to really kind of
be able to scope these things out for yourself in advance, so you don't waste time on
things that people won't support. So those are the types
of things that we'll talk about in class towards making sure, if you want to take on a role, either managing data scientist or being a data scientist
yourself in an industry, that you kind of almost
have a checklist of things that you need to go through when you're thinking
through working on projects if they already exist, or scoping out new projects
that you have to sell and position within the
company for success. - Thank you, Shawndra. So this was, I think, a wonderful, like, you know high-level
introduction to your work and your world. Before we really got into more specifics, a couple of housekeeping items, I got message that some people have some issues with the audio. So I tried to mute myself when
you were talking this time, but I don't know if the
audio is getting better, so if participants are
getting audio issues, please feel free to... Okay, sounds great. Okay. So apparently
things are getting better. So I will just mute myself when you speak to make sure there's no echo. - Can you hear me okay, when I'm talking? - I think it's fine. Yes. I think it's fine. Maybe I know somehow there
was some issue earlier. So another issue is that, of course, all of the participants who
feel free to enter questions into the Q and A, and you know, we'll try to address as
many of them as possible. So with that now, let's, you know, Shawndra and we try to drill
that a little bit deeper, like more, specifically more concretely into your work. So I know that you prepared a
few slides about your research and your work. So I'm going to give you a chance maybe to give some examples, to maybe give people a
bit more tangible example of your work. So I'm gonna let you share
that if you would like. - Great. Thanks. So just for the audience,
I prepared a few slides just to kind of walk you
through a few examples. They're not meant to be comprehensive, but I think they'll help
you follow the discussion that we're going to have. So-- - And then we'll go back to a
more higher level after that, the more, like in general, how data science is impacting marketing. But I think it's good now
to have some specifics to set the ideas a little bit. So go ahead, Shawndra, thank you. - Can you see my screen? - I can see your screen. I think everyone can. Thank you. - Great. So for most of my career, whether it be in marketing
or in other areas where I've applied data
science techniques, I've really tried to
connect data oftentimes for the first time, to
answer marketing problems in this context, or sometimes use data in a way that people haven't used it before. So if you could go back
in your head to 2004, back in 2004, it was the first time I actually got excited
about marketing problems. Like as Olivier mentioned, I got my PhD in Information Systems. I actually applied thinking
I was going to predict the stock market. I don't
know what I was thinking, but anyway, I quickly got interested
in marketing problems, and here's the reason why. I was doing an internship
where I got access to social network data. So connections between people, where those connections were phone calls. So on this slide, it's a network where the
nodes on these networks are supposed to represent people, and the connections between
them are phone calls, but, you know, the idea
is that birds of a feather flock together on these networks, right? And so these connections in
absence of having information on things like race, religion,
age, gender, geography, can be used as a proxy
for knowing that people are in some way similar. And so we used this idea in marketing, and the way that it worked was, we took existing customers of a service, and we then looked at
all of their friends, and asked whether these connections could tell us something
about their likelihood of adopting that service. And it turned out that
they were five times more likely to purchase
this particular product that we were advertising, than people both selected at random, and even after doing
a propensity matching, they were more likely
to purchase the product. So believe it or not, this
was maybe the first work that connected social network data to real business outcomes for marketing. So again, remember, this is like 2004, pre everybody being on Facebook, Facebook existed, but there
wasn't a lot of data like this. So it turned out that companies
got wind of this idea. And there were startups
that were even started up on this basis of, you know, connections between
people having some value for predicting attributes of users. And so a lot of people ran
with this idea after this work, maybe they were thinking
about it in parallel, but certainly these
companies weren't growing kind of before this work came out. And then since then, you know,
sort of the rest is history, like social networks are used
to predict a lot of things, about people for various reasons. And so one thing that always
bothered me about companies just kind of running with this idea is, we showed that the connections
in the telecom setting mattered for this one product, right? So it, at the time it was voice over IP. Like people paid for it back then, right? Now we all get voiceover IP for free, but back then, they paid for
it and it was this one product, and it worked very well, and there could have been
reasons why that was the case. You know, like maybe
when people were talking to their friends, they
were talking about it, like, hey, do you hear my
new, you know, phone quality? Or it could have just
been this homophily idea or something else, but
it really bothered me that like people ran without testing. So what I did was I collected
a lot of social network data. So this data weren't perfect in the sense that I collected it externally
using the Twitter API, and Facebook APIs, and getting
as much data as I could, but what I did was I took TV show handles, and brand and product handles on Twitter. I got all of their connections. So all of the people that followed them, and then all of the
people that followed them. So I got the followers of
the brands and TV shows, then the followers of the followers, and for all of those followers, I got their tweets over time. So it was a lot of, you know, pinging this this Twitter API. And what I was able to do with that data is pretty much build a
pseudo recommendation engine, where I say, okay, like a person or a Twitter user followed Coca-Cola say, let
me take that as information for my recommendation model, and then predict what other
brands they would follow. And I did that based
on the social network, I did that based on text features, and I did that based on
more of a traditional sense of people being similar,
based on the products that they either purchase
or have in common. And so this image, let me just explain it to you. It's a little complicated because I don't have the whole top. But on the horizontal axis, so we built these recommendation engines. It's the number of recommendations that I was making to these users. And then on the vertical access, think of it as the difference
between the performance of the social network
based recommendations, and the product base, like the traditional
basing this on products. And so it turns out
that for some verticals, the product network
consistently did better, and for other verticals, the
social network did better. So for things like children's products, home products, media
entertainment, sports, where you would expect people to have, expect these advertisers or products to have a niche audience, like where homophily
would actually matter, then the social network
actually did better. And then for cases where
you would expect everybody to follow these products,
things like household products, or health related things, the
product network did better. So this was a first step
at trying to understand like when social network data
would actually outperform at least in this setting, product network. And so we also collected text data and built recommendation
engines based on texts, where people were similar
based on the words that they used in their tweets. And we were able to
characterize the audiences of these brands, based on the tweets that
their followers say. And so we could connect
those copra of tweets to the attributes of the
characteristics of people who followed these brands. We got this from a different data source. And we could figure out like which words were actually predictive
of certain characteristics. So these are word clouds that show, on the left side, words that products that have
stronger female audience say, versus the types of things
that are said in the tweets for audiences of products that skew male. And so we did that for
a bunch of categories, and we cross tab these categories, and we can get these nice
sort of word associations with certain audience types. And so from a marketing perspective, like this could help brands and products, and advertisers, understand
who their audiences are a little bit better. I will say, with all of this work, the difference I think in how
academics approach the work and maybe how it's
traditionally approached in product group is this, right? So a lot of times, there are these overall
high level metrics or KPIs, that a company needs to report on, or a team needs to report on
for a particular solution. And very rarely in my experience, do people try to figure out if there's any heterogeneity
by user or by brand type, when you're talking about big systems like recommendation engines, that need to make predictions broadly. And so how I think about my role even now, is sort of digging into the details, to try to figure out
where heterogeneity lies. So that was kind of... Using social media data is something I did for quite a while, and it was exciting because it was new data at the time. And then I joined Microsoft, and I focused a lot on
sponsored search data, and sponsored search data were exciting, because it was new to me anyway, but also because you can do a lot with it, learning about the customer journey, like, so for any given customer, how they search over time for a particular category of product, or for a specific product. Once you're sort of embedded, you have access to a lot
of information about users, anonymized, obviously, but
you can see for instance, whether somebody already
owns a product or not, and look at how they
respond to advertising, whether they are an
existing customer or not. So this slide is just showing sort of kind of an experiment, how it worked, and I'll explain
a couple of ways we used it. So basically, people search
for brands on search engines. So here there's a search for Edmunds. And usually when it's a product or brand you'll see first a set of advertisements, followed then by organic search data. And so the experiment that was running, is that the advertisements
were getting shuffled randomly above a certain threshold. And they were either showing
one, well, zero, one, two three or four ads. And so what that enabled us to do, is to see what happens when, for instance, in this case in step two, so this is just showing that
sometimes in the experiment, no advertisements are at the top, right? So this would let us answer, so what happens if a
customer doesn't advertise on sponsored search? Like how much traffic do they still get from their organic link? For instance. We could also ask questions
around what happens, like, because these things
are now being shuffled, the advertisements, what
happens when a competitor is shown above the focal brand, or a complimentary product,
for instance, right? Because normally, what
would happen is the brand that is being searched for,
will show up at the top. So if you're studying
without an experiment, you'd only know sort of what happens when it's at the top or nothing. But because these things
were being shuffled, we can ask interesting questions about competitors, and complimentary products, and also, as I mentioned,
one of the things that we did was see whether these ordering effects that we wanted to study, like
how much traffic stealing there is, as the focal
brand moves down the page. Whether that is stronger or weaker when a person already owns the product. So these data, again,
were maybe in this case, it wasn't new, like these data were sitting around, but asking questions in new ways. And this is the work that
I'm referencing actually, is with another faculty member, in the marketing
department, Andre Seminar. So again, like partnership with academia. So the search data was exciting
to me for another reason. I was really interested in TV research, specifically like TV advertising research. And when I joined Microsoft,
it wasn't a TV ad company. However, one thing that
we could look at is, how TV ads interact with
sponsored search ads. And then it became
interesting to both Bing and advertisers who
advertised across TV channels and sponsored search channels. So the general idea is that people are sitting in front of their TV, and they are responding
to what they see online, or on some device that's not their TV. And so it did a lot of work
looking at how people respond, who responds and why,
trying to get at why. So the general ideas,
people see something on TV, they go to a search engine
like Bing, they search for it, they get the sponsored search results, and then they click on something, right? And maybe they're looking for
something related specifically to what was shown in the commercial, or maybe they're looking
for the brand more broadly, but those are things that
we could try to understand with this data. Historically, and in most
of the work that I've done, we treated TV ads as events. So they had a specific
time, in a specific place. Usually, we looked at ads
that were shown nationally in the US, and we can measure
the search spikes immediately after a TV ad. So this is a plot from real data, where the surface laptop
in 2017 was aired. And so we'd see this orange line, where the ad was aired, and
then the search spikes after. And we weren't the first to show that there are search spikes, but what we tried to understand,
because we had access to more data is who these people are, with respect to their demographics and which types of devices
they were searching from. And even if there are
differences in how the attention shifted, with respect to
the user characteristics. So another study that we did, so this is like a lot to
digest, I know, in one slide, but it's a very exciting project, and one that I was really excited about, where we moved from this, right? So before and most prior work, not all, really treated TV ads as events, where you have an ad showed at 8:00 PM, and you look what happens
right in the minutes after. And the reason why you
couldn't really look long-term for any one advertisement campaign, is because once you look term, they're just so many confounders that come into play, including that the same ad might be shown a few minutes later. And so that's when you're
looking at aggregate level data. So what happens in aggregate, all the people in the US, and what happens after 8:00 PM on search, and what happens for, you know, just knowing that a particular
ad was shown at 8:00 PM in the US. What we tried
to do more recently, was connect users to their TV viewership in a privacy friendly way, at the individual household level. So then we could say,
we know that at least, this box was tuned in to a TV commercial, what happened for them? So that enables us to
do two things at least. One, is look more longer-term
at what happens to people who see TV ads, with respect
to like their search behavior. And it also enables us
to look at advertisers who advertise all the time. So for those of you who are marketers, or, you know, work on
advertising campaigns, you might know this, but there's some advertisers who like any minute of
the day, are always on. So for telecom, for things
like food and beverage, at certain times of the day, you couldn't even do
this event type analysis. So it enables us to be able
to look at more advertisers and measurements. A project that I worked on
with Olivier is related, but different, where instead
of looking at TV ads, we looked at TV show events, and we looked at how people
searched for information about TV shows over the course
of, for the most part a day, but like before and after shows. And what we wanted to
understand is those dynamics, but also could we jointly
model the search behavior and the click action, so what people click on, in such a way that would
help us to do better at predicting, one, what people would likely want to see, and also as a result, want to click on. So this is just showing over time, interest in the Super Bowl in 2016, so you see this huge spike,
when the Super Bowl starts. It stays high, the interest, while the Super Bowl is on. But what's interesting is, if we looked at what
people were searching for over the course of 24 hours
before during the game, and 24 hours after, we saw a lot of the same searches, so these are some of the top searches, but what you see as
indicated by these colors, is that some were searched more
before the Super Bowl was on and some were searched more after. So we wanted to look at both, these dynamics of what
people were interested in, with respect to their searches, but what might not be obvious to you, is not only were people
searching differently for these topics, but even for something
like the Super Bowl, they were clicking on
different things, right? So they were searching for Super Bowl, but maybe before the ad, sorry, before the TV show aired
or the Super Bowl aired, they were looking at, you know, what the time was or when it was on, versus after they're
looking who won the MVP. So they could have searched
for that specifically, but it was also those types of things were also reflected in the clicks, even for a generic term like Super Bowl. So the way that we modeled
it was using these dynamics in the clicks, as well as topic modeling, topic modeling the snippets, and the text of the queries, in such a way that enabled us to do a better job at prediction than if we were to not
factor in these dynamics and the end search. So that was a really fun
and exciting project. And so finally, I'm going
to end really briefly on something that I worked on. The motivation was to understand whether showing diverse
characters in TV ads, led to better outcomes for businesses. So basically what we were finding and we know this to be true, is that, more and more advertisers
were including characters in their ads, that reflected the diversity of the people in the United States. And so what we wanted to
understand is like, you know, is that good for business
basically, right? It seems like it was good
from a social perspective and the right thing to do, but we want to know if
it was good for business. So when we started, we didn't know how to get at these answers of which advertisers were more inclusive in their advertising, versus not because that data didn't exist. So we had to create it. So the way that we did it, was we used off the shelf tools for video extraction and image labeling. So we took a video, we
extracted information using this video indexer
that's available by Microsoft. We then let the tool
automatically label the age, gender and not race, it doesn't label race
anymore of the users. And also we did a
duplication of the actors that were in the show. So like, you know, somebody
might turn to the side, and then the image would
show up twice in our data set for a particular advertiser. So basically after we
extracted all the images, in addition to
automatically labeling them, we ran a study on Amazon Mechanical Turk, where we had people label the images based on what they thought was, who they thought was in the image with respect to their
gender, age and race. And we ended up with about 6,000 videos, and a bunch of different labels. We asked for every image for two labels. So what was interesting
is that for gender, people were able to label
the characters pretty clearly in the sense that the raters, the two raters agreed most of the time, but when we asked them about race, only 69% of them agreed on white, 61% on blacks, 10% on
Hispanics, and 15% on Asians. So that's interesting just as an aside, because these models are
trying to do a better job of labeling race, but even
humans can't do a good job. And that's something
that has to be considered when thinking about these models, and actually putting
them out into the world. So we built a toolkit and I
want to be sensitive to time, so I'm just going to say very quickly, on top of this data, we basically came up with
a set of inclusivity scores for an advertiser, for the
vertical, for the industry, and using these scores, we can measure the diversity
in these different groups, and plot them in such a way
that an advertiser could go in, and ask how they are comparing
against other advertisers in their cohort, whether that
be at the product type level or at the industry level. So this was the first step, and I didn't get to connecting
this to business outcomes, but that would be the next step, but still, it's an example
of using computational tools to understand what's
going on in marketing. So I will, and there like, there's some obvious
things just as a, you know, a couple of points in
terms of the results. Like we saw things like
women were more likely to show up in retail stores
and health and beauty, but less likely in
electronics and communication and vehicles. Blacks were shown more for political, government organizations and education, but less so in home and real estate. And seniors were shown more
in insurance and pharma, and less for apparel and footwear. So there are some face
validity here, you know, to at least make us believe we were going in the right direction for these labels. But again, the connections
to business outcomes would be the next step. So I think that really is it. So I'm going to stop sharing my screen, and hopefully the slides helped. I kind of pulled out like pictures that I think would make the points, but yeah, these are examples of using
computation in marketing. - Thank you, Shawndra. Thank you, this is a lot of
information, very exciting work. And there's a few questions in the chat. Some of them are technical,
some of them are big picture, so maybe we can go through
them if you don't mind. So you talked about how
you do propensity matching in the social network. Someone wants to know whether you used the neighbors to do that. I'm guessing it's more for logistic regression probably, or... - Yeah. So at the time,
it wasn't widely accepted, and even now, like it's
been criticized to use this for marketing problems, but at the time we use
logistic regression. - Now, someone also, some pretty technical questions, in your job, how much
of the time you spend building coding the prototypes? - That's a great question. So in my prior role at Microsoft, I did not spend as much
time as I do now coding. So I think, I'm saying that to say that the answer is, I can't
make a general answer, because I think each team is different and it depends on the resources. So I was lucky in my prior role to have engineers that worked on our team that could support research
and building prototypes, and I don't have that now. - Thank you. Now we go to a much more deep, or maybe like philosophical question, which is how do you
balance ethics and outputs, you know, in the time when companies in the tech world are under
scrutiny for using analytics? - Yeah. So first of all,
that's a great question. And so let me just tell you, like what happened over the
course of working on the things that I showed you. So back in 2003, 2004, it was kind of like the wild, wild West. Like there were no rules really, you know, and the rule of thumb then, was to talk to lawyers, to make sure you're not doing anything that is violating any laws. Companies and researchers, and, you know like even academics, and the IRBs, like understanding this
data a little bit more, like we've all evolved, and I rely heavily on whatever processes are in place in the
organizations that I belong to, because they think a lot
deeper about the implications for policy, for privacy and for ethics. And so I rely on them, and you know, I don't ever want to do anything that violates anyone's privacy. And so I think about it
often, and I will say that, because of how things have evolved, there are a lot of changes
in what you can and can't do for data in companies. And a lot of it is policy, right? It's like the agreements that
these companies have made with their users and their consumers, and less so about what's allowed by law. And so your question is spot on, and I think times are different now, and some of the things that I showed you, maybe couldn't be done now. - Thank you, Shawndra. So we're going back to a
much more technical question. Someone wants to know
the format of the data that you work with, is it
CSV, Excel, XML, Jason? - Yeah. It depends. So... these days, it's rarely any of those, it's usually in some kind of data store, whether that be a
traditional SQL database, or something that, you know,
can handle much more scale. Historically, like with
the social media data, we would get it from the APIs in Jason, and it was easy to process that way, but I would say a mix, but
the data are so large now, it's like really in a flat file. - Thank you. Now we're going back to, you know, a question that's more
I guess higher-level, so, you know, you've showed
some needs of experiments, observational research, you know, more correlational causation. So someone wants to know how you think about using observational methods, which is experimental methods, in the field or in market applications versus an academic research. So it's the-- - I think that's a great question. I mean, maybe this is,
Olivier is the expert on the academic research side, but like I can tell you, like, in terms of like what can get published. So I think it's really
hard to get work published, if you can't make causal claims, whatever method you're using. Whether it be like observational methods or running experiments. - Especially in marketing, in social science as you said, there's a big emphasis on
the mechanism and the why, so there's still research
that's more predictive, trying to predict or optimize, but it's true that there's
definitely a big taste for being able to understand
what's driving the results for sure. - Yeah. So it's really difficult
to get things published in top journals, if you
can't make causal claims. In practice, it's better. Like people get more excited about results where you can show causality. However, oftentimes
prediction is a fine answer. As long as it's repeatable
and reliable over time, like whatever you're doing. So those are two things. And then the final thing is, there are just some things where you can't run an experiment. Like you just can't. So it's like, even like, so there are people who
have run experiments on social networks, for example, like gifting certain products and like seeing how it shares, but you can't like make
somebody make new friends. I mean it's really hard. So like their context where
you have a wealth of data on user behavior, and you want to answer a
question that's really important, but an experiment just isn't possible. And I'll add something, actually, if you, so we didn't talk about surveys maybe cause like that doesn't come up in the causal context. But the other thing is sometimes if you take search, for
instance, can like reveal a lot about what people are thinking or doing, that they might not reveal. And not because it's like
private or sensitive data, just because like, if
you ask me for instance, what I think about the political
candidate that I voted for, maybe because of political cheerleading, I will say something positive, even if I don't believe it. But so my behavior might tell you more about like what I think
about the candidate, if that makes sense. So I think there's still a lot of value in using observational techniques and data that is generated without experiments. - Thank you. That's very interesting.
There's a couple of questions that I think in some
holding back to your course a little bit. So someone asks whether in these examples that you shared with us, was the person leading
the projects knowledgeable about the technicalities of
getting such research done? And then, you know, maybe
rarely someone else asks, you know, how do you
relate the core of that to your experience at
Facebook and Microsoft? And so, yeah, so it'd be the, yeah, I think getting these
types of products done and the management of
data science projects. - Yeah. So I think it's
going to be directly related to answer the second question first. So it's based on like learning
how to do this well now, after also, you know doing things in a more of an ad hoc way, because like I wasn't
required to be thoughtful about timelines. I mean, maybe I should
have been as an academic, I don't know. I mean I'm saying this
like, but it's different. So I was forced to get better at this. And also I learned from teams that deliver over and
over again really well. So yes, it will be based on my experience, but also based on, you
know, sort of known ways to lead a successful projects. So it's not just my opinion, it's gonna also be based on
project management skills and... And solutions for that. As far as like project,
I don't know exactly which project the first
question is asking about-- - I guess, I think maybe in general, I think the type of
product that you described, does the person leading the
project need to be knowledgeable about the methodology
and the technicalities behind the research? Are these products being managed, maybe by people on the management side? - Is the question more like, can you be a data science manager if you're not like technical? - I don't know. The question is was the person leading the projects knowledgeable about the technicalities of
getting search research done? - Yeah. So it depends on
who you're talking about. So like, if you're talking about somebody who is a technologist, so of course, like they're
going to be experts in their field. Like, that's pretty much
what they're hired for, like as data scientists and
they'll be good scientists. But that doesn't mean that you'll be able to ask the questions right. And again, like teaching students how to ask the questions right, is something that was part
of the technical version of the class. This is altogether different, it's not just like, are you
asking the question right so that you can get an answer? It's also like, who will care? Is your timeline aligned with, you know the team that you're working with who are your stakeholders? And like, are they going to support you? And support comes from
any number of things. Like it could be engineering support, it could be PM support,
it could be sales support. And so let me just answer the question just in case that was
wrapped in it of like, do you have to have a
technical background? I would say like, you need to understand at a high level, like what's happening, but there's nothing more
valuable than a great PM at getting to the bottom line of what a project should be doing, what their timelines are and who what they're delivering. So I've seen, you know, amazing PMs that were
not computer scientists, that really can lead in technical spaces. So you don't have to be, but you have to be willing to learn. Like, cause you do have to
understand what's going on. - Yeah. Thank you, Shawndra. So I think there's a
question that's specifically about the project on the search on TV ads. Was there any insight into who
specifically tended to search at first seeing the TV ads? For example, were the consumers already
likely to have interest in the product, prior to seeing the TV ad or the consumers who
mostly just became aware of the product in that moment? I guess maybe some attribution
maybe issues there. - Yeah. So in one of
the studies that we did, it does look like it's newer people that are searching in those
moments after the TV ad. I will say on a related
study of just to connect it to like things that I do know for sure, without TV ads, we looked
at how people search for products and brands over time. And it turned out that, at least for a couple
of technology products that we looked at, the large majority of people searching after the product was
launched for a few months. So in the beginning like people are trying to get information about the new product, but after, it was largely people who already owned the product. And therefore, you know, it's not clear that those are the people that you want to advertise to. They certainly, well
advertise the same product. They responded different to advertisement. However, they were much more
likely to respond positively to complimentary products. So knowing that information to the point of the question asker is like
really, really important. And it's hard to get, like it's not easy to get
that data in these contexts. - Yeah. Thank you. I think, you know, back
to the privacy issues, there's a question, how
can this function related to kids' products surveys, and the whole protection of privacy laws, I guess if you're trying to study product that are target to kids, for example, are there restrictions, is it possible to do,
or maybe there's a... - So there are restrictions,
I've never touched that. Like, as you know, like,
so I don't exactly know. There's somebody at Kellogg
who studies marketing to kids on the academic side, but I
don't know the answer to that. I do know that these companies
spend a lot of effort trying to identify who's a kid, so that they do not market to them. But as far as like doing studies on them, my guess is that it's like a big no-no, like that's my guess, but I don't know. - Thank you. So I think going back to
the, getting the job done, so someone asks what challenges you have, and/or what tools you need
to do your job better? - So... so two things usually, bandwidth, right? Because... you just go further
faster when more people are working on projects. And I would say the tools,
usually it comes down to more like data than, you know, sort of having more computational power or something like that. It's like you're missing
some piece of the data that would enable you to do things like understand like
who these searchers are, or something to get out the why. A lot of times you can find a connection between two things, even
if you run an experiment, but the why part is oftentimes
like really hard to get at, because you're missing
pieces of the puzzle. So I don't know if that
answers the question, because it's not a tool question, but it's usually like
limitations in the data. - Thank you. So we have only a couple of minutes left, so we touched a little bit, you touched on social networks,
search advertising, TV ads, so are there any other
areas of marketing that, in which you see data
science having an impact, that you'd like to
briefly maybe mention or, are these the three main
buckets in which you've seen most of the action? - Yeah. So I'll say
one that didn't come up but that maybe I'll work on,
is like customer journey. I think the data today, enable you to learn a lot
more about customer journey, and then maybe looking to the future, like I'm not working in this space yet, but it's exciting, things
like product placement, and even a product
placement in video games and virtual reality games. Like I think, you know, that's
just a whole nother space to explore where technology
and computational methods will play or play a role, but pretty much like
anything and everything, you know, like we also didn't talk really about like brand lift, or
like really traditional things like customer lifetime value,
like all of those areas. Like when you have more data on consumers, like you can do more to
understand what's going on. - Yeah. So yes. I think, you
know, maybe final word. I know one topic we haven't talked about, in which one thing that you care about is, the diversity in tech. And I know just, you know, what do you see being done and what more be done to improve diversity in the tech industry? You know, we have a lot of students here, maybe your younger lambs in the panel, any advice for both job candidates and also for managers and
recruiters on that front? - Oh no, I have one
minute to say all that? (both laughing) Well, first of all, let me say it like, anything I say is my opinion, because I'm not like a DNI,
diversity and inclusion expert. And also like, even when
we talk about diversity like that can mean a lot of things, right? Like there are all kinds of dimensions where people may not be
well-represented in groups. But I will say in terms
of what companies do, I mean at least they're
talking about it more, in light of like a lot of things that have happened in this country. I mean, just a few things that I think companies can do better, and some of them are already doing this, is like move these roles
to be like more central in the company, so that they have a little bit more power, and make sure, like for the companies who don't have diversity
and inclusion roles, like have them, you know,
start to build them. Like sometimes in smaller companies, like the burden falls on people that are in underrepresented groups to like make suggestions, or even get involved to run programs. So, you know, the first thing is like just make sure there's somebody
for whom it's their job, and who are experts to
think about, you know, sort of how to improve things. The other thing, like if we're
talking about black women in particular, part of
the problem, I think, maybe even the biggest problem is like the lack of data
because of the small numbers. So like you always have
a small end problem. This shows up in academia too. It's like, you never
know how they're feeling because like, if you survey two people, you can't report usually
for privacy reasons like what the answers
are for those two people. And so that makes it
really hard to do things. So the obvious things like
recruiting differently, you know, sort of training. Some things that we could do, is make sure that people
want technical roles that they're prepared
for technical interviews, because that's oftentimes
like the barrier to entry. And then, you know, for all of these, technology or otherwise, like they just have to change the culture. So that it's welcoming but advice is, you know,
do it, there's space. Like one of the things, and I don't want to
trivialize, like, you know, people's experiences,
because they're valid, and obviously like if
you're in a minority group, it's hard sometimes, but I think also in technology,
at least in my experience it's like, you're rewarded
for being good at what you do. And so it's like just work really hard to be really good, and build networks that
can help you navigate, those times when it's, you know, maybe not as friendly as you would like, but there's space for everybody. Like there's really space for everybody who wants to participate in
data science in particular. - Thank you. That's a great note to end. We're already past 10 o'clock, but thank you very much Shawndra. We're getting lots of
compliments on the chat, people want to hear more from you. So, this is just the beginning. So have a good day everyone
and a good weekend after that, and a happy Thanksgiving
also, while we're there. And so thank you again,
Shawndra very much, and everyone. - Okay, bye.
- Take care. Bye.