YUFENG GUO: Welcome. Today on "AI Adventures," we're joined in the studio by Justin Zhao, a Google research engineer. Hi, Justin. JUSTIN ZHAO: Hi. YUFENG GUO: Thanks for joining me in the studio today. JUSTIN ZHAO: Yeah, it's great to be here. YUFENG GUO: We're going to be talking today about natural language interfaces and how computers and humans can talk to each other in ways that are natural and not awkward. JUSTIN ZHAO: Yep, sounds good. YUFENG GUO: Awesome. So I want to start by talking a little bit about your team's area of research and the general natural language processing field. And then we'll delve into your area of research and see where our conversation takes us. JUSTIN ZHAO: Yeah, that sounds great. So broadly speaking, the area of my research is natural language processing, or NLP. YUFENG GUO: OK. JUSTIN ZHAO: And what that is, NLP is all about trying to understand how humans communicate with each other and how to get a computer to replicate that behavior so that we can interact with computers in a more natural manner. YUFENG GUO: Wow. You guys really picked a small field to target there. JUSTIN ZHAO: [LAUGHS] YUFENG GUO: Yeah, NLP sounds super broad. JUSTIN ZHAO: Yeah. YUFENG GUO: It's like everything. JUSTIN ZHAO: Yeah, it's pretty broad. So in fact, I have some slides that we can pull up just to try to focus it a little bit. YUFENG GUO: Yeah, that'd be great. Yeah. JUSTIN ZHAO: Yeah, so first, I think it's important to talk about the conversational user interface. And for something like the Google Assistant, there's two big domains of NLP problems that come into play. On one side, you have the problem of understanding, which is, what do the users say? What was the user's intent? And on the other side, you have the problem of generation, which was, what should we say to the user? And how do we respond in a way that's intelligent and conversational? YUFENG GUO: Right, that makes sense. JUSTIN ZHAO: So I work on the generation side. And the ultimate goal of natural language generation is to teach computers to turn some kind of structured data into natural language, which we can use to respond to the user in a conversation. YUFENG GUO: Wow. And this is definitely something that I feel like conventionally, NLP has really been broadly thought about as a field where it's all about processing the words and understanding what text means. But you are working on the generation side, which, in a lot of ways, often get overlooked. And so it's really great that you're able to tell us more about this side of things. JUSTIN ZHAO: Yeah, that's what I'm here for. [LAUGHTER] YUFENG GUO: So how do you then teach a computer to generate natural language, rather than just understand it? JUSTIN ZHAO: Right. So for now, let's set aside the structured data part of natural language generation. And we can focus on the natural part of the natural language generation. So what makes a conversation like the one we're having feel human? YUFENG GUO: Speaking of the one we're having, it's a little meta that we're having a conversation about what makes something conversational. JUSTIN ZHAO: So that's a common remark on our team. YUFENG GUO: Yeah, we have to not be too robotic in our conversation. JUSTIN ZHAO: Yeah. [LAUGHS] So I think this breaks down into two kinds of requirements. First of all, the content of what we have to say has to make sense in the context of the conversation. So is what I'm saying an appropriate response to what you're saying? Or is it out of the blue? YUFENG GUO: Hey, what are you having for dinner? JUSTIN ZHAO: [LAUGHS] So-- YUFENG GUO: That's kind of out of the blue. JUSTIN ZHAO: That's kind of out of the blue, yeah, definitely. So yeah, exactly. And then I have to think about if what I'm going to say is actually going to answer your question. So if you were going to ask me where we want to go for dinner, it would be weird to suggest a coffee shop or a clothing store. YUFENG GUO: Right. JUSTIN ZHAO: Yeah. YUFENG GUO: Yeah, unless you really wanted to get some coffee stains on your clothes for dinner. JUSTIN ZHAO: Yeah, I guess so. The second requirement is that you actually have to use the language correctly. So this is like, how's my grammar? Do my verbs agree? Or if I'm using a pronoun, is it ambiguous? YUFENG GUO: That makes sense. So it's basically what do you say, and then how do you say it? JUSTIN ZHAO: Exactly, yeah. YUFENG GUO: OK. And you also mentioned earlier, there is this structured data that we put aside. Where does that come into play? JUSTIN ZHAO: That's a great question. So structured data primarily helps us figure out the first requirement, which is what we want to say. For example, let's say a user asked us about the weather next week in Santa Clara. In Google Search results, we see a box filled with all this information about the weather for the next week. And somewhere within this data, hopefully, answers the user's question. And we just have to figure out how to turn all this data into a response to the user. That's the problem that we're focusing on in natural language generation. YUFENG GUO: And that's because we're talking about a situation where we're going to say our answer and not just show them a box to look at. JUSTIN ZHAO: That's correct. YUFENG GUO: OK, so It's like an audio interface. Gotcha. JUSTIN ZHAO: Right. YUFENG GUO: And in that case, I guess I can imagine a naive solution for this sort of problem. We already have the data. Right? JUSTIN ZHAO: Yeah. YUFENG GUO: But I don't know if it would be sufficient. JUSTIN ZHAO: Well, you know, that depends. By all means, go for it. YUFENG GUO: All right, so let's say we make some kind of a template. Right? And we can say, on blank day, it will be blank temperature, and then some blank weather condition. Like on Tuesday. It will be 72 degrees and partly cloudy. And then you could build a full forecast by just iterating through all the days of the week like that. JUSTIN ZHAO: So I will say that that is a very straightforward approach. And some assistance do use that implementation. However, in practice, it's a lot less conversational than you might think. So how about you try asking me what's the weather like this week? And then I'll use your algorithm to generate a response. YUFENG GUO: All right, sounds good. We'll call this the Justin Assistant. JUSTIN ZHAO: That's perfect. YUFENG GUO: All right. OK, Justin. What's the weather like next week? JUSTIN ZHAO: Hi, Yufeng. Sunday, it'll be 66 degrees and partly cloudy. Monday, it will be 63 degrees and cloudy. Tuesday, it'll be 66 degrees and partly cloudy. Wednesday, it'll be 68 degrees and cloudy. Thursday-- YUFENG GUO: Oh, boy. OK, that's getting too long and just too robotic. Yeah, let let's call it. Let's call it at that. JUSTIN ZHAO: Yeah, even saying it, for me, felt a little strange. YUFENG GUO: Yeah. So clearly, generating natural language from structured data is non-trivial. How would you actually go about using a computer system to answer the user's question then? JUSTIN ZHAO: Well, first, I would want to think about how I would answer it as a human. So as a human, I would hope that I'd be a little more contextually aware. And I would realize that there's actually a lot of repetitive information in the data. So I'd probably try to summarize it, something like, it'll be cloudy until Thursday with showers the rest of the week. Temperatures range from the high 40s to the mid-60s. YUFENG GUO: Hey, you might want to consider a career as a weather forecaster if, you know, this whole research thing doesn't work out. JUSTIN ZHAO: Yeah, maybe. [LAUGHS] YUFENG GUO: All right, so we've done a little bit of an overview of natural language generation, about what makes conversation natural. And we even gave kind of a admittedly silly example of leveraging the structure data to select content for a natural language response. JUSTIN ZHAO: Yeah, and we've also included some links with more info in the video. YUFENG GUO: That's right. That we have. All right, so then getting back to the topic at hand. How does machine learning then get involved? JUSTIN ZHAO: Well, that's the ultimate question that our team is trying to answer. Without machine learning, everything that we've talked about so far, from parsing the data to figuring out what to say to actually figuring out how to say it, you have to do this with writing lots of rules. And rules are great. They're very stable. They're very predictable. But they're usually very specific. And they require a lot of engineering. And because of that, it's not really scalable to new inputs and outputs. For example, if we wanted to talk about finance instead of weather, or if we wanted to support an entire new language altogether, it would require writing a whole new set of rules. YUFENG GUO: Yeah, and it sounds like that would be way harder to maintain as well, keeping all those rules lined up as things change. And it would also be hard to replicate that creativity and spontaneity that comes with human conversation. JUSTIN ZHAO: Right. So that's exactly one of the motivations of our research. Our hope is that by giving the model examples of data and the language it needs to generate, we can let the model form its own rules about what to do. And not only does this save us from having to write these rules ourselves by hand, but it also gives the computer more free rein to be creative in its own way. YUFENG GUO: So showing many examples to answer questions, you might say, so that you can write fewer rules. I mean, that's the crux of machine learning as a whole. That's wonderful. JUSTIN ZHAO: Yeah, exactly. YUFENG GUO: And so what kind of machine learning architectures then are you guys exploring to try to tackle this problem? JUSTIN ZHAO: Well, so far, we've seen really promising results with recurrent neural networks. But that's just one kind of neuro-architecture that we're exploring. YUFENG GUO: OK. Recurrent neural networks. So on our previous episode, we looked at deep neural networks on the show. And that had neurons connected in layers, resulting in something in a lattice structure. Right? And for our viewers, can you explain what it means to have a recurrent neural network? JUSTIN ZHAO: Yeah, so you can think of a recurrent neural network as a deep neural network, but just wrapped in a For loop. And the network is recurrent because the outputs of the network feed back into itself. And instead of this one shot input-output, the model can make decisions over several time steps. YUFENG GUO: OK. Awesome. That's a really great way to conceptualize it. I really love that. And we've also included some links about recurrent neural networks down below. And if you have more questions about this network structure, feel free to leave them below in the comments, and we'll try to get to them. For now, we'll talk about why recurrent neural networks will be useful for doing natural language generation. JUSTIN ZHAO: Right, so it's point to keep in mind that language, just in general, is extremely sequential. YUFENG GUO: Sure yeah. JUSTIN ZHAO: For example, the cat sat on the mat is a very different sentence from cat sat the mat on. YUFENG GUO: Yeah, order matters. Definitely. JUSTIN ZHAO: So RNNs are especially good at remembering what it saw earlier, because it enforces a sequential policy over the data. The inputs are decided in a very ordered manner instead of in these large conglomerates. YUFENG GUO: OK. So I guess it's both amazing and not entirely surprising that recurrent neural nets would be useful for natural language problems, it sounds like, where, as humans, we rely a lot on what we previously said to figure out what we will say next. JUSTIN ZHAO: Mm-hmm. exactly. YUFENG GUO: So let's talk a bit more then on how you're using these recurrent neural nets to generate this language. JUSTIN ZHAO: So one fun variation when it comes to recurrent neural nets is that since the output is generated one step at a time, you can choose the granularity of your output. So some models can choose courser outputs, like entire word phrases, or just words in general. And then this goes all the way down to models that output bytes, single bytes at a time. YUFENG GUO: One byte at time, OK. JUSTIN ZHAO: And for us, we've been using outputs at the character level. YUFENG GUO: OK. So you're, like, spelling out the words. JUSTIN ZHAO: Right. YUFENG GUO: OK. JUSTIN ZHAO: And this kind of model is a character-based RNN. And you can find out more information in the links below. YUFENG GUO: So when we first talked about having you on the show, you showed me this interesting graph here. JUSTIN ZHAO: Right. YUFENG GUO: I would love to understand it a little better. What is it showing us exactly? JUSTIN ZHAO: So this is a small visualization of our recent research. Each row here represents different pieces of our structured data. YUFENG GUO: Gotcha. JUSTIN ZHAO: The shading of the squares indicates how much the model actually cares about that piece of structured data. And lastly, each column represents a single step in our model. So as we travel across the columns, you can see how the model has learned to pay attention to the structured data at different time steps. YUFENG GUO: OK. So we're traveling left to right, character by character, for each column. And so the lit portions, the lighter parts, are the parts that the model is paying attention to. JUSTIN ZHAO: Right, exactly. YUFENG GUO: OK, and then on this model over here, for example, it means the model is paying attention to this bit to decide what character to output. It's not saying that that's the character it'll say. That's just the data it's looking at? JUSTIN ZHAO: Right, exactly. So it's going to look at that particular piece of data to try to figure out what character to output. Exactly. YUFENG GUO: All right. JUSTIN ZHAO: And then one really cool result is this diagonal line in the middle. YUFENG GUO: Yeah, how about that? It's kind of formulaic. It almost looks like you guys added that in afterwards to make for something interesting. JUSTIN ZHAO: It's like hardcoded. YUFENG GUO: Yeah. JUSTIN ZHAO: So those particular pieces of data are basically the characters for a specific location. And what that diagonal line is showing us is that when the model has reached the part of the sentence where it wants to spell out the specific location, it's learned to read that from the data, character by character. YUFENG GUO: Wow. That is awesome. And no one taught the model to do that. They were just able to learn how to do that just by looking at examples. JUSTIN ZHAO: Exactly. That's the magic of it. YUFENG GUO: Incredible. That's super outstanding, yeah. JUSTIN ZHAO: So the diagonal line is pretty cool. But if you dive into our data, there's actually a lot of other intriguing ways that the model learns by itself how to reference the data to decide what character to output. So that said, there's still a ton to explore. But I am super excited to see what we come up with in the future and how far we can push our research. YUFENG GUO: This looks super cool, Justin. And I'm really excited to hear about what your team comes up with next. Maybe you'll write a research paper using one of these networks in the future. JUSTIN ZHAO: Yeah, that sounds pretty fun. YUFENG GUO: Justin, I want to thank you so much for coming into the studio today and teaching our viewers about natural language generation. Looking forward to catching up again in a minute. I'm going to wrap up here. JUSTIN ZHAO: Yeah, OK. Sounds good. It was my pleasure. YUFENG GUO: All right. Sweet. Well, I hope you enjoyed this episode of "AI adventures." I certainly did. In our conversation, we talked about using machine learning for natural language generation and its role in conversational user interfaces. I had a blast chatting with Justin. And if you like this format, please, let us know in the comments below. And for more information and details about everything that we talked about, we've included tons of links in the description. And be sure to subscribe to the channel to catch future episodes and maybe some more interviews as they come out. [MUSIC PLAYING]
Published: Thu Oct 19 2017
