Designing for Accessibility (Google I/O'19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] ELISE ROY: You see this? This is my old hearing aid. And this? This is one that I wear today. What's different? This one's flesh-colored and this one's red. It may seem like a small design tweak, but it changed my life. It made me feel as if I belong again. You see, right before fifth grade, my mom sat me down and she said, you need to tell Michelle about your hearing loss. Michelle was my best friend, but I hadn't seen her all summer long. And two weeks before, I was told that I was losing my hearing and it was going to just get worse and worse and worse. But I was 10 years old. I didn't know how to deal with heavy stuff like this. And so when I called her up before school started, we talked about our summers, we talked about sports, we talked about everything but my hearing loss. On the second day of school, she was standing behind me in line and she tapped me on the shoulder, pointed to my hearing aid and said, what's that? It was an innocent question, but I didn't know how to respond, and so I said, it's a hearing aid, as if she was stupid not to know. And then there was just silence. And this silence followed us for the rest of our relationship. She never asked me again about my hearing loss, and I spent that year watching her slowly drift away. I felt different, as if I didn't belong, and she being just 10 years old didn't know how to deal with this difference. Many kids avoid difference because they're just not sure what to do with it. And so I found out quickly that I didn't want to be seen as different. And again, this long struggle to try to prove that, although I had a hearing loss, it didn't change me. I was still normal. And I did this by overachieving. Going-- when I went to college, playing just one sport wasn't enough. I had to play two. I had to go to an Ivy League. I became one of the first few deaf lawyers in the US. I did some work at the United Nations, and then I became a designer. But somewhere along the way, I realized something. I am the new normal. You know that TV show "Orange is the New Black"? Well, I am the new normal. Difference is the new normal. Difference, even if it seems like limitation, is what makes us thrive, what makes us valuable. Now, I would like to think about this. This body encompasses all of us. If we all live long enough, we will all get disability at some point in our lives. And who here has broken their legs or their arms? Really, that's it? Come on. That's an example of a temporary disability. But what comes next is key. We also all experience something called momentary disabilities. Now, I'd like to call up a volunteer to come up and help me demonstrate what they are. [APPLAUSE] What I'd like you to do is to pick up that box and those books, and then come over while carrying the box, take sip of the water. You're going to have to open it, though. You have to open it. That was pretty impressive. Most pe-- AUDIENCE: [INAUDIBLE] [APPLAUSE] ELISE ROY: Was it easy? Hard? AUDIENCE: Totally. Kind of fun. ELISE ROY: Yes. So thank you very much for your help. As we go about our lives, we encounter situations where we will be momentarily disabled, whether we're carrying box and trying to open up a door. And so disability really encompasses all of us. There are just some of us that experience it a lot more than others. Now, as a lawyer, I fought for equality and race, gender, disability, and you would think that I would have been outfitted with the skills necessary to feel accepted and valued by society. But to my surprise, I found the strongest tools when I transitioned from law to design. Design has this powerful ability to shift perceptions, but it's up to you to use it. Up to you. So finally, it happened. After law school, I went back to the audiologist to get a new hearing aid, and I was thrilled because they weren't just these awful flesh-colored things anymore, but they invented red ones and blue ones and green ones. So I opted for the bright red one, and then something magical happened-- my hearing aid became cool. People started saying things like, [GASP] love the red! This little thing created this huge shift in my life. It allowed me to celebrate my difference and it allowed others to join in on celebrating this difference with me. This is because it opened up the door to conversing about difference without being focused on limitations. MICHAEL BRENNER: OK. Thank you, Elise. That was a beautiful talk, and it was a very good introduction to our story, which we call Project Euphonia. So we're going to start the story by telling you a story about one of our colleagues at Google. So this is Dimitri Kanevsky, and Dimitri, it turns out, is a mathematician. He's worked at some of the great institutions for mathematics in the world. But for the last two decades, he's really been thinking primarily about designing for accessibility-- that is, trying to invent technology that was helpful in some way or other. So Dimitri himself has a disability-- he's deaf-- and he also has a very strong Russian accent. So the first time that at least I met Dimitri, I found it very hard to understand what he was talking about. But, you know, hanging out with Dimitri, eventually you get the idea. So it turns out that our computers have the same problem-- that is, when Dimitri speaks to his phone as I might speak to my phone, his phone doesn't understand him very well. And this is a clip in which he explains that himself. So what you see from this is that the phone that was being showed was a phone that was running the Google Cloud Speech Recognition Model. And what I would claim is that if you only looked at the phone, that you would not be able to really understand the thread of what Dimitri was trying to communicate. And so we asked ourselves the question, why is that the case? Why is it that the phone was not able to understand Dimitri but, for example, it is able to understand me? And in order to explain this, I need to tell you a little bit about how speech recognition works and why it is that speech recognition has gotten so much better over the past number of years. So when we speak, what we're doing is creating a wave form. So a wave form is just a sound wave and it looks rather unintelligible. The job that we're asking a computer to do is to take the picture on the left and to somehow turn it into the words that are being said. So as you all know, humans have gotten very good at interpreting pictures, and so the way that speech recognizes work is we first take the wave form and turn it into a picture. The picture is called a spectrogram and it's just a picture of colors, but it's still unintelligible as to what was being said. And then what we do is take the picture and stick it into a neural network, which is a big computer program that has lots of parameters in it. And the idea is to make the computer program so that it outputs what was being said. Now of course, just like us, if you don't train the computer program, it has no idea what was being said. And so what we do is we take all of the numbers in this computer program-- there are millions of numbers that you have to tune-- and we give it one sentence at a time, somebody saying something. And the computer predicts it's saying this and then it gets it wrong, and we bang the computer over the head, twiddle the parameters around a little bit until eventually by giving it lots and lots of sentences, it gets better at speech recognition. And we have phones that work for people whom the computer has heard. Now in order to do that, it takes huge numbers of sentences. So tens of millions, say, of sentences need to be given to the computer for it to develop a general type of understanding. But the problem is that for people like Dimitri, or indeed anyone who speaks in a way that is different than the pool of examples that the computer was given, the phone can't understand them just because it's never heard the example before. And so the question that we asked, and this was a question that we started asking in collaboration with an ALS foundation that we've been working with-- ALS TDI, who gave me this T-shirt-- so we asked whether or not it's possible to basically fix the speech recognizers to work for people who are hard to understood. And Dimitri is amazing and he decided to take this on. So remember what I said-- it takes tens of millions of sentences to train a speech recognizer. It's completely crazy to ask someone to sit and record tens of millions of sentences. But Dimitri has a great spirit, and so he sat in front of his computer and he just started recording sentences. And so, for example, here is a sentence-- what is the temperature today? And so the computer would say "What is the temperature today?" And Dimitri would read "What is the temperature today?" And he sat there for days recording these sentences, until he had reported upwards of 15,000 sentences, and we then decided to train the speech recognizer to see if it was able to understand him. And I should tell you that none of us knew whether or not it was even conceivable that this could work because, as I said, it took many more sentences to train the thing in the first place for many people who speak in a way that is more typical for speech recognizers. So here's Dimitri at the end. He was still happy after doing this. And then here is the-- I'm now going to show you a quick clip of what happened. And so what you see is that the device on the right was able to understand Dimitri, whereas the device on the left, which is the Google Cloud device, was not. And this really gave us confidence that it was possible to make progress on this task. And so we started working in earnest with our collaborators ALS TDI and which we recruited. They recruited a large number of people with ALS to start recording sentences to see if this works. Now, of course, getting someone to record 15,000 sentences is completely crazy. That's never going to work at scale. And so instead we were investigating technically whether or not it's possible to make progress with smaller numbers of sentences. And what I can report to you is that we're making progress. We're not there yet. We do not feel that we've solved this problem in any way. But we're working hard, and there are groups of engineers at Google who are working hard. And this is just a little example. So the last column is the ground truth phrases, the rightmost column is what Google Cloud recognizes on this particular person who happens to have ALS, and the middle column is what our recognizer is right now doing, and we're hard at work trying to figure out if it is possible to make this work for people without requiring so much training data. So this is Dimitri as of this week. So Dimitri now carries around with him about five different phones in his pocket, each of which has a different speech recognizer on it, and he's testing and trying to figure out the best way. And it is our hope that if we can get this to work with Dimitri's help and with all of your help, and hopefully people will record, make recordings for us-- the reason for this call for data that Sundar made is that we need more data from people, just recordings to be able to make this work. Hopefully we will get there. That is our goal. And so this sort of is the general goal of Euphonia's mission, which is what we would like to do is to improve communication technology by including as many people as possible, whatever features that the people have and whatever means to communicate. Of course, speaking is an important way of communicating, but it is not the only way that we communicate. We communicate with each other by looking, by feeling, by doing so many different things. And there are people who don't have the ability to speak, and so now I'm going to turn it over to Irene, who will start to talk about other speaking modalities. IRENE ALVARADO: All right. Thanks, Michael. [APPLAUSE] All right, so so far we've talked about Dimitri and about speech, but what about other forms of communication? What about folks who can't communicate verbally? We want to show you how we're approaching the research for those types of cases as well. So for that, I'd like to introduce our second protagonist for the day, the amazing Steve Saling. He's an incredible person. He had a brilliant career as a landscape architect and when he learned that he has ALS, he set about to rethink how people with his condition get care. He also started thinking about how he could leverage technology to create more independence for himself, so that he didn't have to rely as much on other people to take care of him. And one thing he helped do is he helped create a smart home-like system that lets him request an elevator and close the blinds, turn on the music, all by using his computer. It's really amazing. So Steve happened to be one of the perfect persons to partner with for this research because he is a technologist himself. And speaking of computers, we want to show you how many folks who have ALS communicate today. They use something called an eye gaze pointer to type out letters one by one. So these are two different systems that they can use either a keyboard or something on the right called Dasher. And it works-- it does the job-- but if you can imagine, it's just a little bit slow. And what he's missing is a layer of communication that all of us are familiar with-- interruptions, mannerisms, jokes, laughs. Synchronous communication that comes by quickly. That's something that's really hard for Steve and people with his condition to do. So something we wanted to try with him was to see if he could train his own personal machine learning models to classify different face expressions, and the thought was, is this even useful for him to be able to trigger things more quickly so that he might be able to open his mouth and trigger something on the computer or raise his eyebrows and trigger something else? It was a question. It's a research question. And we didn't know the answer. So with Steve's feedback, his ideas, and a lot of testing, we developed a machine learning tool that anybody actually can use to train classification models in the browser. And by classification, I mean a model that tries to predict what category a certain type of input belongs to. Let me show you an example so you see how it works. This is my colleague Barron and he's training two classes, one to detect his face and one to detect this really cute cat pillow that he has. So he's giving the computer a bunch of data. He's training it, waiting for it to finish, and then he's testing the model on the right. And then he publishes the model. All of this is happening in the browser in real time, and the images-- the processing is happening in his computer, so the images aren't being sent to a server. It's all happening in his computer in the browser. So we're calling this Teachable Machine. It's a tool for anybody to train machine learning models in the browser without having to know how to code. And it's actually built on top of TensorFlow.js, so all of the underlying technology is free and it's open source for you to use. So, OK, how is Steve using this? Well, as I mentioned, he's training face classification models for cases where he might want a faster response time than what he can achieve with his eye gaze pointer, and Teachable Machine is the prototyping tool that's allowing him to do this and explore what types of use cases are actually helpful for him. So why is this useful? Well, Teachable Machine is situational in two ways, right? ALS actually changes over time, so people with the condition, they deteriorate over time. So Steve might be able to do an expression today that he can't do in a year. He has to be able to retrain those models on his own, perhaps week by week, month by month, as he needs it. And the second thing is that you might imagine that he might want to use different models for different use cases. One thing that he actually tried was training a model that would trigger an air horn, like a sound of an air horn when he opens his mouth, and to trigger a boo when he raises his eyebrows. And he used it one night to watch a basketball game with one of his favorite teams to react quickly to the game as it progressed. Unfortunately that night, his team didn't win, but it was actually really fun to set up. So we've got a long way to go with this research. This is really only the beginning, and we hope to expand the tool to support many more modes of input. The tool itself will be available later this year for anyone to train their classification models, but as I said before, all of the technology is already available on TensorFlow.js. We're committed to working with people like Steve and Dimitri to make their communication tools better, and the idea really is to start with the hardest problems that might unlock innovations for everyone. But it's our sincere hope that this kind of research might help people with other types of speech impairments-- people with cerebral palsy or Parkinson's or multiple sclerosis. And maybe, perhaps one day, it could be helpful to even more people-- people who freely communicate today, maybe like folks who have an accent in a second language. And in fact, we started calling this approach to building "Start with One, Invent for Many." We think anybody can work this way, and you can apply to many more types of problems. The idea is actually quite simple-- so start by working together with one person to solve one problem, and that way you can be sure that what you make for them will be impactful to them and the people and their lives. And sometimes-- it doesn't always happen, but sometimes-- what you make together can go on to be useful to many more people. Start with One, Invent for Many. If you'd like to hear more about this project and Start with One, if you'd like to hear more about Dimitri, Steve, and actually play Teachable Machine, we have all of these projects in the Experiment Sandbox tent, which is actually really close to the stage. And finally, lastly, we'd like to invite you to help this research effort. As Michael was saying, we don't expect people to train 15,000 phrases in order to get a model like this, so we actually need volunteers to share their voice samples with us so that we may one day generalize these models. So if you or anyone you know has hard-to-understand speech, we'd like to invite you to go to this link and submit some samples, and hopefully one day we can make these models more widely accessible to everyone. Thank you. [APPLAUSE] [MUSIC PLAYING]

Info

Channel: Google Developers

Views: 5,554

Rating: undefined out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate

Id: guXNEcwg6os

Channel Id: undefined

Length: 22min 31sec (1351 seconds)

Published: Thu May 16 2019