YUFENG GUO: Welcome. Today on "AI Adventures,"
we're joined in the studio by Justin Zhao, a Google
research engineer. Hi, Justin. JUSTIN ZHAO: Hi. YUFENG GUO: Thanks for joining
me in the studio today. JUSTIN ZHAO: Yeah,
it's great to be here. YUFENG GUO: We're going
to be talking today about natural
language interfaces and how computers and humans
can talk to each other in ways that are natural
and not awkward. JUSTIN ZHAO: Yep, sounds good. YUFENG GUO: Awesome. So I want to start by talking
a little bit about your team's area of research and the general
natural language processing field. And then we'll delve into
your area of research and see where our
conversation takes us. JUSTIN ZHAO: Yeah,
that sounds great. So broadly speaking,
the area of my research is natural language
processing, or NLP. YUFENG GUO: OK. JUSTIN ZHAO: And
what that is, NLP is all about trying
to understand how humans communicate
with each other and how to get a computer
to replicate that behavior so that we can
interact with computers in a more natural manner. YUFENG GUO: Wow. You guys really picked a
small field to target there. JUSTIN ZHAO: [LAUGHS] YUFENG GUO: Yeah, NLP
sounds super broad. JUSTIN ZHAO: Yeah. YUFENG GUO: It's
like everything. JUSTIN ZHAO: Yeah,
it's pretty broad. So in fact, I have some slides
that we can pull up just to try to focus it a little bit. YUFENG GUO: Yeah,
that'd be great. Yeah. JUSTIN ZHAO: Yeah,
so first, I think it's important to talk about the
conversational user interface. And for something like
the Google Assistant, there's two big domains of NLP
problems that come into play. On one side, you
have the problem of understanding, which
is, what do the users say? What was the user's intent? And on the other side,
you have the problem of generation, which was, what
should we say to the user? And how do we respond
in a way that's intelligent and conversational? YUFENG GUO: Right,
that makes sense. JUSTIN ZHAO: So I work
on the generation side. And the ultimate goal of
natural language generation is to teach computers to turn
some kind of structured data into natural language,
which we can use to respond to the user in a conversation. YUFENG GUO: Wow. And this is definitely
something that I feel like conventionally, NLP
has really been broadly thought about as a field where it's
all about processing the words and understanding
what text means. But you are working
on the generation side, which, in a lot of
ways, often get overlooked. And so it's really
great that you're able to tell us more
about this side of things. JUSTIN ZHAO: Yeah,
that's what I'm here for. [LAUGHTER] YUFENG GUO: So how do
you then teach a computer to generate natural language,
rather than just understand it? JUSTIN ZHAO: Right. So for now, let's set aside
the structured data part of natural language generation. And we can focus on the natural
part of the natural language generation. So what makes a
conversation like the one we're having feel human? YUFENG GUO: Speaking
of the one we're having, it's a little
meta that we're having a conversation about what
makes something conversational. JUSTIN ZHAO: So that's a
common remark on our team. YUFENG GUO: Yeah, we
have to not be too robotic in our conversation. JUSTIN ZHAO: Yeah. [LAUGHS] So I think this breaks down
into two kinds of requirements. First of all, the content
of what we have to say has to make sense in the
context of the conversation. So is what I'm saying
an appropriate response to what you're saying? Or is it out of the blue? YUFENG GUO: Hey, what are
you having for dinner? JUSTIN ZHAO: [LAUGHS] So-- YUFENG GUO: That's kind
of out of the blue. JUSTIN ZHAO: That's kind of out
of the blue, yeah, definitely. So yeah, exactly. And then I have to think
about if what I'm going to say is actually going to
answer your question. So if you were going to ask me
where we want to go for dinner, it would be weird to suggest
a coffee shop or a clothing store. YUFENG GUO: Right. JUSTIN ZHAO: Yeah. YUFENG GUO: Yeah,
unless you really wanted to get some coffee stains
on your clothes for dinner. JUSTIN ZHAO: Yeah, I guess so. The second requirement
is that you actually have to use the language correctly. So this is like,
how's my grammar? Do my verbs agree? Or if I'm using a
pronoun, is it ambiguous? YUFENG GUO: That makes sense. So it's basically
what do you say, and then how do you say it? JUSTIN ZHAO: Exactly, yeah. YUFENG GUO: OK. And you also mentioned earlier,
there is this structured data that we put aside. Where does that come into play? JUSTIN ZHAO: That's
a great question. So structured data
primarily helps us figure out the first
requirement, which is what we want to say. For example, let's say a user
asked us about the weather next week in Santa Clara. In Google Search
results, we see a box filled with all this
information about the weather for the next week. And somewhere within
this data, hopefully, answers the user's question. And we just have
to figure out how to turn all this data into
a response to the user. That's the problem
that we're focusing on in natural language generation. YUFENG GUO: And that's because
we're talking about a situation where we're going
to say our answer and not just show
them a box to look at. JUSTIN ZHAO: That's correct. YUFENG GUO: OK, so It's
like an audio interface. Gotcha. JUSTIN ZHAO: Right. YUFENG GUO: And in
that case, I guess I can imagine a naive solution
for this sort of problem. We already have the data. Right? JUSTIN ZHAO: Yeah. YUFENG GUO: But I don't know
if it would be sufficient. JUSTIN ZHAO: Well, you
know, that depends. By all means, go for it. YUFENG GUO: All
right, so let's say we make some kind of a template. Right? And we can say, on
blank day, it will be blank temperature, and then
some blank weather condition. Like on Tuesday. It will be 72 degrees
and partly cloudy. And then you could
build a full forecast by just iterating through all
the days of the week like that. JUSTIN ZHAO: So I will say that
that is a very straightforward approach. And some assistance do
use that implementation. However, in practice, it's
a lot less conversational than you might think. So how about you
try asking me what's the weather like this week? And then I'll use your algorithm
to generate a response. YUFENG GUO: All
right, sounds good. We'll call this the
Justin Assistant. JUSTIN ZHAO: That's perfect. YUFENG GUO: All right. OK, Justin. What's the weather
like next week? JUSTIN ZHAO: Hi, Yufeng. Sunday, it'll be 66
degrees and partly cloudy. Monday, it will be 63
degrees and cloudy. Tuesday, it'll be 66
degrees and partly cloudy. Wednesday, it'll be
68 degrees and cloudy. Thursday-- YUFENG GUO: Oh, boy. OK, that's getting too
long and just too robotic. Yeah, let let's call it. Let's call it at that. JUSTIN ZHAO: Yeah,
even saying it, for me, felt a little strange. YUFENG GUO: Yeah. So clearly, generating natural
language from structured data is non-trivial. How would you actually go
about using a computer system to answer the user's
question then? JUSTIN ZHAO: Well,
first, I would want to think about how I
would answer it as a human. So as a human, I
would hope that I'd be a little more
contextually aware. And I would realize
that there's actually a lot of repetitive
information in the data. So I'd probably try
to summarize it, something like, it'll
be cloudy until Thursday with showers the
rest of the week. Temperatures range from the
high 40s to the mid-60s. YUFENG GUO: Hey, you
might want to consider a career as a weather
forecaster if, you know, this whole research
thing doesn't work out. JUSTIN ZHAO: Yeah, maybe. [LAUGHS] YUFENG GUO: All
right, so we've done a little bit of an overview of
natural language generation, about what makes
conversation natural. And we even gave kind of
a admittedly silly example of leveraging the structure
data to select content for a natural language response. JUSTIN ZHAO: Yeah, and we've
also included some links with more info in the video. YUFENG GUO: That's right. That we have. All right, so then getting
back to the topic at hand. How does machine learning
then get involved? JUSTIN ZHAO: Well, that's the
ultimate question that our team is trying to answer. Without machine
learning, everything that we've talked about
so far, from parsing the data to figuring out what
to say to actually figuring out how to say it, you
have to do this with writing lots of rules. And rules are great. They're very stable. They're very predictable. But they're usually
very specific. And they require a
lot of engineering. And because of that, it's not
really scalable to new inputs and outputs. For example, if we wanted to
talk about finance instead of weather, or if we wanted to
support an entire new language altogether, it would
require writing a whole new set of rules. YUFENG GUO: Yeah, and
it sounds like that would be way harder
to maintain as well, keeping all those rules
lined up as things change. And it would also be
hard to replicate that creativity and spontaneity that
comes with human conversation. JUSTIN ZHAO: Right. So that's exactly one of the
motivations of our research. Our hope is that by
giving the model examples of data and the language
it needs to generate, we can let the model form its
own rules about what to do. And not only does this
save us from having to write these rules
ourselves by hand, but it also gives the
computer more free rein to be creative in its own way. YUFENG GUO: So showing many
examples to answer questions, you might say, so that
you can write fewer rules. I mean, that's the crux of
machine learning as a whole. That's wonderful. JUSTIN ZHAO: Yeah, exactly. YUFENG GUO: And so what kind of
machine learning architectures then are you guys exploring
to try to tackle this problem? JUSTIN ZHAO: Well, so far, we've
seen really promising results with recurrent neural networks. But that's just one kind
of neuro-architecture that we're exploring. YUFENG GUO: OK. Recurrent neural networks. So on our previous episode, we
looked at deep neural networks on the show. And that had neurons
connected in layers, resulting in something
in a lattice structure. Right? And for our viewers,
can you explain what it means to have a
recurrent neural network? JUSTIN ZHAO: Yeah,
so you can think of a recurrent neural network
as a deep neural network, but just wrapped in a For loop. And the network is recurrent
because the outputs of the network feed
back into itself. And instead of this
one shot input-output, the model can make decisions
over several time steps. YUFENG GUO: OK. Awesome. That's a really great
way to conceptualize it. I really love that. And we've also
included some links about recurrent neural
networks down below. And if you have more questions
about this network structure, feel free to leave them
below in the comments, and we'll try to get to them. For now, we'll talk about why
recurrent neural networks will be useful for doing natural
language generation. JUSTIN ZHAO: Right,
so it's point to keep in mind that language,
just in general, is extremely sequential. YUFENG GUO: Sure yeah. JUSTIN ZHAO: For example,
the cat sat on the mat is a very different sentence
from cat sat the mat on. YUFENG GUO: Yeah, order matters. Definitely. JUSTIN ZHAO: So
RNNs are especially good at remembering
what it saw earlier, because it enforces a
sequential policy over the data. The inputs are decided in a
very ordered manner instead of in these large conglomerates. YUFENG GUO: OK. So I guess it's both amazing
and not entirely surprising that recurrent neural nets would
be useful for natural language problems, it sounds like,
where, as humans, we rely a lot on what we
previously said to figure out what we will say next. JUSTIN ZHAO: Mm-hmm. exactly. YUFENG GUO: So let's talk
a bit more then on how you're using these
recurrent neural nets to generate this language. JUSTIN ZHAO: So
one fun variation when it comes to
recurrent neural nets is that since the output is
generated one step at a time, you can choose the
granularity of your output. So some models can
choose courser outputs, like entire word phrases,
or just words in general. And then this goes all
the way down to models that output bytes,
single bytes at a time. YUFENG GUO: One
byte at time, OK. JUSTIN ZHAO: And for us,
we've been using outputs at the character level. YUFENG GUO: OK. So you're, like,
spelling out the words. JUSTIN ZHAO: Right. YUFENG GUO: OK. JUSTIN ZHAO: And this kind of
model is a character-based RNN. And you can find out more
information in the links below. YUFENG GUO: So when
we first talked about having you
on the show, you showed me this
interesting graph here. JUSTIN ZHAO: Right. YUFENG GUO: I would love to
understand it a little better. What is it showing us exactly? JUSTIN ZHAO: So this is
a small visualization of our recent research. Each row here represents
different pieces of our structured data. YUFENG GUO: Gotcha. JUSTIN ZHAO: The
shading of the squares indicates how much
the model actually cares about that piece
of structured data. And lastly, each column
represents a single step in our model. So as we travel
across the columns, you can see how the
model has learned to pay attention to
the structured data at different time steps. YUFENG GUO: OK. So we're traveling left to
right, character by character, for each column. And so the lit portions,
the lighter parts, are the parts that the model
is paying attention to. JUSTIN ZHAO: Right, exactly. YUFENG GUO: OK, and then on this
model over here, for example, it means the model
is paying attention to this bit to decide
what character to output. It's not saying that that's
the character it'll say. That's just the data
it's looking at? JUSTIN ZHAO: Right, exactly. So it's going to look at
that particular piece of data to try to figure out
what character to output. Exactly. YUFENG GUO: All right. JUSTIN ZHAO: And
then one really cool result is this diagonal
line in the middle. YUFENG GUO: Yeah,
how about that? It's kind of formulaic. It almost looks like you
guys added that in afterwards to make for something
interesting. JUSTIN ZHAO: It's
like hardcoded. YUFENG GUO: Yeah. JUSTIN ZHAO: So those
particular pieces of data are basically the characters
for a specific location. And what that diagonal
line is showing us is that when the
model has reached the part of the sentence
where it wants to spell out the specific
location, it's learned to read that from the data,
character by character. YUFENG GUO: Wow. That is awesome. And no one taught
the model to do that. They were just able to
learn how to do that just by looking at examples. JUSTIN ZHAO: Exactly. That's the magic of it. YUFENG GUO: Incredible. That's super outstanding, yeah. JUSTIN ZHAO: So the diagonal
line is pretty cool. But if you dive into
our data, there's actually a lot of
other intriguing ways that the model
learns by itself how to reference the data to decide
what character to output. So that said, there's
still a ton to explore. But I am super
excited to see what we come up with in the
future and how far we can push our research. YUFENG GUO: This looks
super cool, Justin. And I'm really excited to hear
about what your team comes up with next. Maybe you'll write
a research paper using one of these
networks in the future. JUSTIN ZHAO: Yeah,
that sounds pretty fun. YUFENG GUO: Justin,
I want to thank you so much for
coming into the studio today and teaching our
viewers about natural language generation. Looking forward to catching
up again in a minute. I'm going to wrap up here. JUSTIN ZHAO: Yeah, OK. Sounds good. It was my pleasure. YUFENG GUO: All right. Sweet. Well, I hope you enjoyed this
episode of "AI adventures." I certainly did. In our conversation, we talked
about using machine learning for natural language
generation and its role in conversational
user interfaces. I had a blast
chatting with Justin. And if you like this
format, please, let us know in the comments below. And for more information
and details about everything that we talked about,
we've included tons of links in the description. And be sure to
subscribe to the channel to catch future episodes and
maybe some more interviews as they come out. [MUSIC PLAYING]