Real-time communication with WebRTC: Google I/O 2013

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
JUSTIN UBERTI: Hi everyone. Thanks for coming to the session on WebRTC for plugin-free realtime communication. I'm Justin Uberti, tech lead for WebRTC at Google. And with me today is-- hey, has anyone seen Sam? SAM DUTTON: Hey. JUSTIN UBERTI: Sam Dutton, coming to you live from WebRTC on Chrome for Android. [APPLAUSE] SAM DUTTON: On a beautiful Nexus 7. We got this low-res to cope with the Wi-Fi here. That seems to be working pretty well. JUSTIN UBERTI: That was quite an entrance. Why don't you come up here and introduce yourself? SAM DUTTON: Yeah. Hey. I'm Sam Dutton. I'm a developer advocate for Chrome. JUSTIN UBERTI: So we're here to talk to you today about the great things that WebRTC's been working on and how you can use them. So what is WebRTC? In a nutshell, it's what we call realtime communication-- RTC-- the ability to communicate live with somebody or something as if you were right there next to them. And this can mean audio, video, or even just peer-to-peer data. And we think WebRTC is really cool. But there's a lot of other people who are really excited about WebRTC as well. And one of the reasons is that WebRTC fills a critical gap in the web platform, where previously, a native proprietary app like Skype could do something the web just couldn't. But now we've turned that around and changed that so we have a web of connected WebRTC devices that can communicate in realtime just by loading a web page. So here's what we're trying to do with WebRTC, to build the key APIs for realtime communication into the web, to make an amazing media stack in Chrome so that developers can build great experiences, and to use this network of connected WebRTC devices to create a new communications ecosystem. And these kind of seem like lofty goals. But take this quote from the current CTO of the FCC who said he sees traditional telephony fading away as voice just becomes another web app. So we're trying to live up to that promise. And right now, you can build a single app with WebRTC that connects Chrome, Chrome for Android, Firefox, and very soon, Opera. I'm especially excited to announce the as of this week, Firefox 22 is going to beta, which is the very first WebRTC-enabled version of Firefox. So within a matter of weeks, we will have over one billion users using a WebRTC-enabled browser. [APPLAUSE] JUSTIN UBERTI: And I think that just gives a good idea of the size of the opportunity here. And we respect that number to grow very significantly as both Chrome and Firefox get increased adoption. For places where we don't have WebRTC-enabled browsers, we're providing native, supported, official tool kits on both Android, and very soon, iOS, that can interoperate with WebRTC in the browser. [APPLAUSE] JUSTIN UBERTI: So here are just a handful of the companies that see the opportunity in WebRTC and are building their business around it. So that's the vision for WebRTC. Now let's dig into the APIs. There are remain categories of API that exist in WebRTC. First, getting access to input devices-- accessing the microphone, accessing the webcam, getting a stream of media from either of them. Secondly, being able to connect to another WebRTC endpoint across the internet, and to send this audio and video in realtime. And third, the ability to do this not just for audio and video, but for arbitrary application data. And we think this one is especially interesting. So because there's three categories, we have three objects. Three primary objects in WebRTC to access this stuff. The first one, MediaStream, for getting access to media, then RTCPeerConnection and RTCDataChannel. And we'll get into each one of these individually. Sam, why don't you tell us about MediaStream? SAM DUTTON: Yeah, sure. So MediaStream represents a single source of synchronized audio or video or both. Each MediaStream contains one or more MediaStream tracks. For example, on your laptop, you've got a webcam and a microphone providing video and audio streams, and they're synchronized. We get access to these local devices using the getUserMedia method of Navigator. So we just look at the code for that, just highlight that. And you can see that getUserMedia there, it takes three parameters, three arguments there. And the first one, if we look at the constraints argument I've got, you can see I'm just specifying I want video. That's all I'm saying. Just give me video and nothing else. And then in the success callback, we're setting the source of a video using the stream that's returned by getUserMedia. Let's see that in action, really simple example here. And you can see when we fire the getUserMedia method, we get the allow permissions bar at the top there. Now, this means that users have to explicitly opt in to allowing access to their microphone and camera. And yeah, there we have it. Using that code, we've got video displayed in a video element. Great. What really excites me about these APIs is when they come up against each other, like in this example. What's happening is, that we've got getUserMedia being piped into a canvas element, and then the canvas element being analyzed, and then producing ASCII, just like that, which could make a good codec, I think. JUSTIN UBERTI: It would be a good codec. You can press it using just gzip. SAM DUTTON: Yeah, smaller font sizes, high resolution. Also, another example of this from Facekat. Now what's happening here is that it's using the head tracker JavaScript library to track the position of my head. And when I move around, you can see I'm moving through the game and trying to stay alive, which is quite difficult. God, this is painful. Anyway-- whoa. OK, I think I've flipped into hyperspace there. And an old favorite, you've may well have seen a webcam toy which gives us access to the camera, kind of photobooth app, uses WebGL to create a bunch of slightly psychedelic effects there. I quite this old movie one, so I'll take that and get a snapshot. And I can share that with my friends, so beautiful work from Paul Neave there. Now you might remember I said that we can use the constraints object. The simple example there was just saying, use the video, nothing else. Well, we can do more interesting things with constraints than that. We can do stuff like specify the resolution or the frame rate, a whole stack of things that we want from our local devices. A little example from that, if we go over here. Now, let's look at the code, actually. If we go to the dev tools there, you can see that I've got three different constraints objects, one for each resolution. So when I press the buttons, I use the QVGA constraints, getUserMedia, and then with the VGA one, I'm getting high resolution. And for HD, I'm getting the full 1280 by 720. We can also use getUserMedia now for input from our microphone. In other words, we can use getUserMedia to provide a source node for web audio. And there's a huge amount of interesting stuff we can do with that processing audio using web audio, from the mic or wherever. A little example of that here-- I'll just allowed access to the mic, and you can see, I'm getting a nice little visualization there in the canvas element. And I can start to record this, blah blah blah blah blah-- [AUDIO PLAYBACK] -To record this, blah blah blah blah blah-- [END AUDIO PLAYBACK] SAM DUTTON: And yeah, you can see that's used recorder.js to save that locally to disk. GetUserMedia also now-- this is kind of experimental, but we can use getUserMedia to get a screen capture, in other words data coming directly from what we see on screen, not from the audio video from the mic and the camera. Probably the simplest if I show you an example of this, so yeah, a little application here. And when I click to make the call, allow, and you can see there that I get this kind of crazy hall of mirrors effect, because I'm capturing the screen that I'm capturing, and so on and so on. Now that's quite nice. But it would be really useful if we could take that screen capture and then transmit that to another computer. And for that, we have RTCPeerConnection. JUSTIN UBERTI: Thanks, Sam. So as the name implies, RTCPeerConnection is all about making a connection to another peer and over this peer connection, we can actually then go and send audio and video. And the way we do this is we take the media streams that we've got from getUserMedia, and we plug them into the peer connection, and send them off to the other side. When the other side receives them, they'll pop out as a new media stream on their peer connection. And they can then plug that into a video element to display on the page. And so both sides of a peer connection, they both get streams from getUserMedia, they plug them in, and then those media streams pop out magically encoded and decoded on the other side. Now under the hood, peer connection is doing a ton of stuff-- signal processing to remove noise from audio and video; codec selection and compression and decompression of the actual audio and video; finding the actual peer-to-peer route through firewalls, through NATs, through relays; encrypting the data so that a user's data is fully protected at all times; and then actually managing the bandwidth so that if you have two megabits, we use it. If you have 200 kilobits, that's all we use. But we do everything we can hide this complexity from that web developer. And so the main thing is that you get your media streams, you plug them in via Adstream to peer connection, and off you go. And here's a little example of this. SAM DUTTON: Yeah, so you can see here that we've created a new RTCPeerConnection. And when the stream is received, the callback for that in gotRemoteStream there attaches the media we're getting from a video element to the stream. Now, at the same time, we're also creating what's called an offer, giving information about media, and we're setting that as the local description, and then sending that to the callee, so that they can set the remote description. You can see that in the gotAnswer function there. Let's have a little look at RTCPeerConnection on one page, a very simple example here. So what we've got here is getUserMedia here, just start that up. So it's getting video from the local camera here, displaying it on the left there. Now when I press call, it's using RTCPeerConnection to communicate that video to the other-- yeah, the other video element on the page there. This is a great place to start to get your head around RTCPeerConnection. And if we look in the code there, you can see that it's really simple. There's not a lot of code there to do that, to transmit video from one peer to another. JUSTIN UBERTI: So that's really cool stuff. A full video chat client in a single web page, and just about 15 lines of JavaScript. And we talked a bit quickly through the whole thing around how we set up the parameter of the call, the offers and answers, but I'll come back to that later. The next thing I want to talk about is RTCDataChannel. And this says, if we have a peer connection which already creates our peer-to-peer link for us, can we send arbitrary application data over it? And this is the mechanism that we use to do so. Now one example where we would do this would be in a game. Like, take this game. I think it's called Jank Wars or something. And we have all these ships floating around onscreen. Now, when a ship moves, we want to make sure that's communicated to the other player as quickly as possible. And so we have this little JSON object that contains the parameters and the position and the velocity of the ships. And we can just take that object and stuff it into the send method, and it will shoot it across the other side where it pops out as onMessage. And the other side can do the same thing. It can call send on its data channel, and it works pretty much just like a WebSocket. That's not an accident. And we tried to design it that way, so that people familiar with using WebSockets could also use a similar API for RTCDataChannel. And the benefit is that here, we have a peer-to-peer connection with the lowest possible latency for doing this communication. In addition, RTCDataChannel. can be either unreliable or reliable. And we can think about this kind of like UDP versus TCP. If you're doing a game, it's more important that your packets get there quickly than they're guaranteed to get there. Whereas if you're doing a file transfer, the files are only any good if the entire file is delivered. So you can choose this as the app developer, which mode you want to use, either unreliable or reliable. And lastly, everything is fully secure. We use standard DTLS encryption to make sure that the packages you send across the data channel are fully encrypted on their way to the destination. And you can do this either with audio and video, or if you want to make a peer connection for just data, you can do that as well. So Sam's going to show us how this actually works. SAM DUTTON: Yeah, so again, another really simple example. We're creating a peer connection here, and once the data channel is received, in the callback to that, we're setting the receive channel using the event.channel object. Now, when the receive channel gets a message, kind of like WebSocket really, we're just putting some text in a local div there, using event.data. Now, the send channel was created with createDataChannel. And then we got a send button. When that's clicked, we get the data from a text area, and we use the send channel to send that to the other peer. Again, let's see this in action. This is, again, a good place to start-- one page demo, with all the code for RTCDataChannel, so type in some text, and we hit send, and it's transmitting it to the other text area. A great place to start if you're looking at RTCDataChannel. Something a little more useful here, a great app from Sharefest. Now, Sharefest is using RTCDataChannel. to enable us to do file sharing. I think I'm going to select a nice photo here I've got of some cherries. And it's popeye, is the URL. And now Justin is going to try and get that up on screen on his side, just to check that that's gone through. So like I say, this is doing file sharing using RTCDataChannel, and there's a huge amount of potential there. There we go. Those are the cherries. JUSTIN UBERTI: I love cherries. SAM DUTTON: These are beautiful Mountain View cherries, actually. They were really, really nice. JUSTIN UBERTI: All this data is being sent peer-to-peer, and anybody else who connects to the same URL will download that data peer-to-peer from Sam's machine. And so none of this has to touch Sharefest servers. And I think that's pretty interesting if you think about things like file transfer and bulk video distribution. OK, so we talked a lot about how we can do really clever peer-to-peer stuff with RTCPeerConnection. But it turns out we need servers to kind of get the process kicked off. And the first part of it is actually making sure that both sides can agree to actually conduct the session. And this is the process that we call signaling. The signaling in WebRTC is abstract, which means that there's no fully-defined protocol on exactly how you do it. The key part is that you just have to exchange session description objects. And if you think about this kind of like a telephone call, when you make a call to someone, the telephone network sends a message to the person you're calling, telling them there's an incoming call and the phone should ring. Then, when they answer the call, they send a message back that says, the call is now active. Now, these messages also contain parameters around what media format to use, where the person is on the network, and the same is true for WebRTC. And these things, these session description objects, contain parameters like, what codecs to use, what security keys to use, the network information for setting up the peer-to-peer route. And the only important thing is that you just send it from your side to the other side, and vice versa. You can use any mechanism you want-- WebSockets, Google Cloud Messaging, XHR. You can use any protocol, even just send it as JSON, or you can use a standard protocols like SIP or XMPP. Here's a picture of how this all works. The app gets a session description from the browser and sends it across through the cloud to the other side. Once it gets the message back from the other side with the other side's session description, and both sessions consider passed down to WebRTC in the browser, WebRTC can then set up and conduct the media link peer-to-peer. So we do a lot to try to hide the details of what's inside the RTCSessionDescription, because this includes a whole bunch of parameters-- as I said, codecs, network information, all sorts of stuff-- this is just a snippet of what's contained inside a session description right now. Really advanced apps can do complex behaviors by modifying this, but we designed API so that regular apps just don't have to think about it. The other thing that we need servers for is to actually get the peer-to-peer session fully routed. And in the old days, this wouldn't be a problem. A long time ago, each side had a public IP address. They send each other's IP address to each other through the cloud, and we make the link directly between the peers. Well, in the age of NAT, things are more complicated. NATs hand out what's called a private IP address, and these IP addresses are not useful for communication. There's no way we can make the link actually peer-to-peer unless we have public address. So this is where we bring a technology called STUN. The STUN server we can contact from WebRTC, and we say, what's my public IP address? And basically, the request comes into the STUN server, it sees the address that that request came from, puts the address into the packet, and sends it back. So now WebRTC knows its public IP address, and the STUN server doesn't have to be in the party anymore, doesn't have to have media flowing through it. So here, if you look at this example, each side has contacted that STUN server to find out what its public IP address is. And then it's sent the traffic to the other IP address through its NAT, and the data still flows peer-to-peer. So this is kind of magic stuff, and it usually works. Usually we can make sure that the data all flows properly peer-to-peer, but not in every case. And for that, we have a technology called TURN built into WebRTC. This turn things around and provides a cloud fallback when a peer-to-peer link is impossible, basically asks for a relay in the cloud, saying, give me a public address. And because this public address is in the cloud, anybody can contact it, which means the call always sets up, even if you're behind a restrictive, or even behind a proxy. The downside is that since the data actually is being relayed through the server, there is an operational cost to it. But it does mean the call works in almost all environments. Now, on one hand, we have STUN, which is super cheap, but doesn't always work. And we have TURN, which always works, but has some cost to it. How do we make sure we get the best of both worlds? Here's TURN in action, where we try to use STUN and STUN didn't work. And we couldn't get the things to actually penetrate the NATs. So instead, we fell back. Only then did we use TURN, and sent the media from our one peer, through the NAT, through the TURN server, and to the other side. And this is all done by a technology called ICE. ICE knows about STUN and TURN, and tries all the things in parallel to figure out the best path for the call. If it can do STUN, it does STUN. If it can do TURN, well then I'll fall back to TURN, but I'll do so quickly. And we have stats from a deployed WebRTC application that says 86% of the time, we can make things work with just STUN. So only one out of seven calls actually have to run through a TURN server. So how do you deploy TURN for your application? Well, we have some testing servers, a testing STUN server that you can use, plus we make source code available for our own STUN and TURN server as part of the WebRTC code package. But the thing I would really recommend is the long name, but really good product-- rfc5766-turn-server-- which has Amazon VM images that you can just take, download, and deploy into the cloud, and you've got your TURN server provisioned for all your users right there. I also recommend restund, another TURN server that we've used with excellent results. One question that comes up around WebRTC is, how is security handled? And the great thing is that security has been built into WebRTC from the very beginning, and so this means several different things. It means we have mandatory encryption for both media and data. So all the data that's being sent by WebRTC is being encrypted using standard AES encryption. We also have secure UI, meaning the user's camera microphone can only be accessed if they've explicitly opted in to making that functionality available. And last, WebRTC runs inside the Chrome sandbox. So even if somebody tries to attack WebRTC inside of Chrome, the browser and the user will be fully protected. So here's what you need to do to take advantage of the security in WebRTC, is really simple. Your app just needs to use HTTPS for actually doing the signaling. As long as the signaling goes over a secure conduit, the data will be fully secured as well using the standard protocols of SRTP for media or Datagram TLS for the data channel. One more question that comes up is around making a multi-party call, a conference call. How should I architect my application? In the simple two-party case, it's easy. We just have a peer-to-peer link. But as you start adding more peers into the mix, things get a bit more complicated. And one approach that people use is a mesh, where basically every peer connects to every other peer. And this is really simple, because there's no servers or anything involved, other than the signaling stuff. But every peer has to send and copy this data to every other peer. So this has a corresponding CPU and bandwidth cost. So depending on the media you're trying to send-- for audio, it can be kind of higher. For video, it's going to be less-- the number of peers you can support in this topology is fairly limited, especially if one of the peers is on a mobile device. To deal with that another architecture that can be used is the star architecture. And here, you can pick the most capable device to be what we call the focus for the call. And the focus is the part that's actually responsible for taking the data and sending a copy to each of the other endpoints. But as we get to handing multiple HD video streams, the job for a focus becomes pretty difficult. And so for the most robust conferencing architecture, we recommend an MCU, or multipoint control unit. And this is a server that's custom made for relaying large amounts of audio and video. And it can do various things. It can do selective stream forwarding. It can actually mix the audio or video data. It can also do things like recording. And so if one peer drops out, it doesn't interrupt the whole conference, because the MCU is taking care of everything. So WebRTC is made with standards in mind. And so you can connect things that aren't even WebRTC devices. And one thing that people want to talk from WebRTC is phones. And there's a bunch of easy things they can be dropped into your web page to make this happen. There's a sipML5, which is a way to talk to various standard SIP devices, Phono, and what we're going to show you now, a widget from Zingaya to make a phone call. SAM DUTTON: OK, so we've got a special guest joining us a little bit later in the presentation. I just wanted to give him a call to see if he's available. So let's use the Zingaya WebRTC phone app now. And you could see, it's accessing my microphone. [PHONE DIALING AND RINGING] SAM DUTTON: Calling someone. I hope it's the person I want. [PHONE RINGING] SAM DUTTON: See if he's there. CHRIS WILSON: Hello? SAM DUTTON: Hey. Is that you, Chris? CHRIS WILSON: Hey, Sam. How's it going? It is. SAM DUTTON: Hey. Fantastic. I just want to check you're ready for your gig later on. CHRIS WILSON: I'm ready whenever you are. SAM DUTTON: That's fantastic. OK, speak to you soon, Chris. Thanks. Bye bye. CHRIS WILSON: Talk to you soon. Bye. SAM DUTTON: Cheers. JUSTIN UBERTI: It's great-- no plugins, realtime communication. SAM DUTTON: Yeah, that situation, we had a guy with a telephone. Something we were thinking about is situations where there is no telephone network. Now, Voxio demonstrated this with something called Tethr, which is kind of disaster communications in a box. It uses the open BTS cell framework-- you can see, it's that little box there-- to enable calls between feature phones via the open BTS cell through WebRTC to computers. You can imagine this is kind of fun to get a license for this in downtown San Francisco, but this is incredibly useful in situations where there is no infrastructure. Yeah, this is like telephony without a carrier, which is amazing. JUSTIN UBERTI: So we have a code lab this afternoon that I hope you can come to, where I'll really go into the details of exactly how to build a WebRTC application. But now we're going to talk about some resources that I think are really useful. The first one is something called WebRTC Internals. And this is a page you can open up just by going to this URL while you're in a WebRTC call. And it'll show all sorts of great statistics about what's actually happening inside your call. This would be things like packet loss, bandwidth, video resolution and sizes. And there's also a full log of all the calls made to the WebRTC API that you can download and export. So if a customer's reporting problems with their call, you can easily get this debugging information from them. Another thing is, the WebRTC spec has been updating fairly rapidly. And so in a given browser, the API might not always match the latest spec. Well, adapter.js is something that's there to insulate the web developer from the differences between browsers and the differences between versions. And so we make sure that adapter.js always implements the latest spec, and then thunks down to whatever the version supports. So as new APIs are added, we polyfill them to make sure that you don't have to write custom version code or custom browser code for each browser. And we use this in our own applications. SAM DUTTON: OK, if all this is too much for you, good news is, we've got some fantastic JavaScript frameworks come up in the last few months, really great abstraction libraries to make it really, really simple to build WebRTC apps just with a few lines of code. Example here from SimpleWebRTC, a little bit of JavaScript there to specify a video element that represents local video, and one that represents the remote video stream coming in. And then join a room just by calling the joinRoom method with a room name-- really, really simple. PeerJS does something similar for RTCDataChannel-- create a peer, and then on connection, you can send messages, receive messages, so really, really easy to use. JUSTIN UBERTI: So JavaScript frameworks go a long way, but they don't cover the production aspects of the service-- the signaling, the STUN and TURN service we talked about. But fortunately, we have things from both OpenTok and Vline that are basically turnkey WebRTC services that handle all this stuff for you. You basically sign up for the service, get an API key, and then you can make calls using their production infrastructure, which is spread throughout the entire globe. They also make UI widgets that can be easily dropped into your WebRTC app. So you get up and running with WebRTC super fast. Now, we've got a special treat for you today. Chris Wilson, a colleague of ours, and a developer in the original Mosaic browser, and an occasional musician as well, is going to be joining us courtesy of WebRTC to show off the HD video quality and full-band audio quality that we're now able to offer in the latest version of Chrome. Take it away, Chris. CHRIS WILSON: Hey, guys. SAM DUTTON: Hey, Chris. How's it going? CHRIS WILSON: I'm good. How are you? SAM DUTTON: Yeah, good. Have you got some kind of musical instrument with you? CHRIS WILSON: I do. You know, originally you asked me for a face-melting guitar solo. But I'm a little more relaxed now. I/O is starting to wind down. You can tell I've already got my Hawaiian shirt on. I'm not ready for some vacation. So I figured I'd bring my ukulele and hook it up through a nice microphone here, so we can listen to how that sounds. SAM DUTTON: Take it away. Melt my face, Chris. [PLAYING UKULELE] SAM DUTTON: That's pretty good. JUSTIN UBERTI: He's pretty good. All right. SAM DUTTON: That was beautiful. Thank you, Chris. [APPLAUSE] CHRIS WILSON: All right, guys. JUSTIN UBERTI: Chris Wilson, everybody. SAM DUTTON: The audience has gone crazy, Chris. Thank you very much. JUSTIN UBERTI: You want to finish up? SAM DUTTON: Yeah. So, we've had-- well, a fraction over 30 minutes to cover a really big topic. There's a great lot of more information out there online, some good stuff on HTML5 Rocks, and a really good e-book too, if you want to take a look at that. There are several ways to contact us. There's a great Google group-- discuss-webrtc-- post your technical questions. All the kind of new news for WebRTC comes through on Google+ and Twitter stream. And we're really grateful of all the people, all of you who've submitted feature requests and bugs. And please keep them coming, and the URL for that is crbug.com/new. So thank you for that. [APPLAUSE] JUSTIN UBERTI: And so we've built this stuff into the web platform to make realtime communication accessible to everyone. And we're super excited because we can't wait to see what you all are going to build. So thank you for coming. Once again, the link. And now, if you have any questions, we'll be happy to try to answer them. Thank you very much. SAM DUTTON: Yeah. Thank you. [APPLAUSE] AUDIENCE: Hi. My name is Mark. I like to know, because I'm using Linux and Ubuntu, how finally can I get rid of the talk plugin for using Hangouts in Google+? JUSTIN UBERTI: The question is, when can we get rid of that Hangouts plug-in? And so unfortunately, we can only talk about WebRTC matters today. That's handled by another team. But let's say that there are many of us who have the same feeling. AUDIENCE: OK. Great. [LAUGHTER] AUDIENCE: Can you make any comments on Microsoft's competing standard, considering they kind of hold the cards with Skype, and how maybe we can go forward supporting both or maybe converge the two, or just your thoughts on that? JUSTIN UBERTI: So Microsoft has actually been a great participant in standards. They have several people they sent from their team. And although they don't see things exactly the same way that we do, I think that the API differences are sort of, theirs is a lot more low-level, geared for expert developers. Ours is a little more high-level, geared for web developers. And I think that really what you can do is you can implement the high-level one on top of the low-level one, maybe even vice versa. So Microsoft is a little more secretive about what they do. So we don't know exactly what their timeframe is relative to IE. But they're fully participating. And obviously, they're very interested in Skype. So I'm very optimistic that we'll see a version of IE that supports this technology in the not-too-distant future. AUDIENCE: Very good to hear. Thank you. AUDIENCE: My question would be, I think you mentioned it quickly in the beginning. So if I wanted to communicate with WebRTC, but one, I'm using a different environment than the browser. Let's say I want a web application to speak to a native Android app. So what would be the approach to integrate that with WebRTC? JUSTIN UBERTI: As I mentioned earlier, we have a fully supported official native version of pure connection, PureConnection.Java, which is open source, and you can download, and you can build that into your native application. And it interoperates. We have a demo app that interoperates with our AppRTC demo app. So I think that using Chrome for Android in a web view is one thing you can think about. But if that doesn't work for you, we have a native version that works great. AUDIENCE: OK. Thank you. AUDIENCE: Hi. My question would be, are there any things that to be taken care between cross-browser compatibility for this Firefox Chrome? Anything specific that needs to be taken care, or it just works? JUSTIN UBERTI: There are some minor differences. I mentioned adapter.js covers some of the things where the API isn't quite in sync in both places. One specific thing is that Firefox only supports the opus codec, and they only support DTLS encryption. They don't support something called S-DES, that we also support. So for right now, you have to set one parameter in the API, and you can see that in our app RTC source code, to make sure that communication actually uses those compatible protocols. We actually have a document, though, on our web page, the documents exactly what you have to do, which is really setting a single constraint parameter when you're creating your peer connection object. SAM DUTTON: Yeah. If you go to webrtc.org/interop. JUSTIN UBERTI: Yeah. That works at org/interop. AUDIENCE: OK. Thank you. AUDIENCE: When a peer connection is made and it falls back to TURN, does the TURN server, is it capable of unencrypting the messages that go between the two endpoints? JUSTIN UBERTI: No. The TURN server is just a packet relay. So this stuff is fully encrypted. It doesn't have the keying information to do anything to it. So the TURN server just takes a byte, sends a byte, takes a packet, sends a packet. AUDIENCE: So for keeping data in sync with low latency between, say, an Android application and the server, how would both the native and the Android Chrome implementations of WebRTC fare in terms of battery life? JUSTIN UBERTI: I don't really have a good answer for that. I wouldn't think there would be much difference. I mean, the key things that are going to be driving battery consumption in this case-- are you talking about data, or are you talking about audio and video? AUDIENCE: Data. JUSTIN UBERTI: For data, the key drivers of your power consumption are going to be the screen and the network. And so I think those should be comparable between Chrome for Android and the native application. AUDIENCE: OK, cool. Thanks. AUDIENCE: With two computers running Chrome, or what have you seen glass-to-glass latency? JUSTIN UBERTI: Repeat? AUDIENCE: Glass-to-glass, so from the camera to the LCD. JUSTIN UBERTI: Oh, yeah. So it depends on a platform, because the camera can have a large delay built into it itself. Also, some of the audio things have higher latencies than others. But the overall target is 150 milliseconds end-to-end. And we've seen lower than 100 milliseconds in best case solutions for glass-to-glass type latency. AUDIENCE: OK. And how are you ensuring priority of your data across the network? JUSTIN UBERTI: That's a complex question with a long answer. But the basic thing, are you saying, how do we compete with cat videos? AUDIENCE: No, just within the WebRTC, are you just-- how are you tagging your packets? JUSTIN UBERTI: Right, so there is something called DSCP where we can mark QoS bits-- and this isn't yet implemented in WebRTC, but it's on the roadmap, to be able to tag things like audio as higher priority than, say, video, and that as a higher priority than cat videos. AUDIENCE: So it's not today, but will be done? JUSTIN UBERTI: It will be done. We also have things for doing FEC type mechanisms to protect things at the application layer. But the expectation is that as WebRTC becomes more pervasive, carriers will support DSCP at least on the bit from coming off the computer and going onto their network. And we have that DSCP does help going through Wi-Fi access points, because Wi-Fi access points to give priority to DSCP-marked traffic. AUDIENCE: Thank you. AUDIENCE: So in Chrome for iOS being limited to UI web view and with other restrictions, how much of WebRTC will you be able to implement? JUSTIN UBERTI: So that's a really interesting question. They haven't made it easy for us, but the Chrome for iOS team has already done some amazing things to deliver the Chrome experience that exists there now. And so we're pretty optimistic that one way or another, we can find some way to make that work. No commitment to the time frame, though. AUDIENCE: What are the mechanisms for a saving video and audio that's broadcast with WebRTC, like making video recordings from it? JUSTIN UBERTI: So if you have the media stream, you can then take the media stream and plug it into things like the Web Rdio API, where you can actually get the raw samples, and then make a wave file and save that out. On the video side, you can go into a canvas, and then extract the frames from a canvas, and you can save that. There isn't really any way to sort of save it as a .MP4, .WEBM file yet. But if you want to make a thing that just captures audio from the computer then it stores on a server, you could basically make a custom server that could do that recording. That's one option. AUDIENCE: So the TURN server is open-- but you said the TURN server doesn't capture. JUSTIN UBERTI: No. AUDIENCE: It can't act as an endpoint. Do you have server technology that acts as an endpoint? JUSTIN UBERTI: There are people building this sort of stuff. Vline might be one particular vendor who does this, but there's something where you can basically have an MCU, and the MCU that receives the media could then do things like compositing or recording of that media. AUDIENCE: So presumably, the libraries for Java or Objective C could be used to create a server implementation? JUSTIN UBERTI: Exactly. That's what they're doing. AUDIENCE: Hi, kind of two-part question that has to do around codecs, specifically on the video side, currently VP8, WebM. Is there plans for H.264, and also what's the timeline for VP9? JUSTIN UBERTI: Our plans are around the VP family of codecs, so we support VP8. And VP9, you may have heard that it's sort of trying to finalize the bit stream right now. So we are very much looking forward to taking advantage of VP9 with all its new coding techniques, once it's both finished and also optimized for realtime. AUDIENCE: And H.264, not really on the plan? JUSTIN UBERTI: We think that VP9 provides much better compression and overall performance than H.264, so we have no plans as far as H.264 at this time. AUDIENCE: OK. AUDIENCE: Running WebRTC on Chrome or Android for mobile and tablets, how does it compare with native performance, like Hangouts on Android? JUSTIN UBERTI: We think that we provide a comparable performance to any native application right now. We're always trying to make things better. We still have Chrome for Android, the WebRTC's behind a flag because we still have work to do around improving audio, improvement some of the performance. But we think we can deliver equivalent performance on the web browser. And we're also working on taking advantage of hardware acceleration, in cases where there's hardware decoders like there is on Nexus 10, and making that so we can get the same sort of down-to-the-metal performance that you could get from a native app. AUDIENCE: So the Google Talk plugin is using not just H.264, but H.264 SVC optimized for the needs of videoconferencing. Is VP8 and VP9 going to be similarly optimized specifically in an SVC-like fashion for video conferencing versus just the versions for file encoding? JUSTIN UBERTI: So VP8 already supports temporal scalability in the S part of SVC. VP9 supports additional scalability modes as well. So we're very excited about the new coding techniques that are coming in VP9. AUDIENCE: So we want to use WebRTC to do live streaming from, let's say, cameras, hardware cameras. And what are the things that we should take care of such kind of an application? And when you mentioned VP8 and VP9 support, H.264 is not supported. Assuming your hardware supports only H.264, WebRTC can be used with Chrome in that case? JUSTIN UBERTI: We are building up support for hardware VP8, and later, VP9 encoders. So you can make a media streaming application like you described, but we're expecting that all the major SSE vendors are now shipping hardware with built-in VP8 encoders and decoders. So as this stuff gets into market, you're going to see this stuff become the most efficient way to record and compress data. AUDIENCE: So the only way is to support VP8 in hardware right now, right? JUSTIN UBERTI: If you want hardware compression, the only things that we support right now will be VP8 encoders. AUDIENCE: That's on the device side, you know, the camera which is on-- JUSTIN UBERTI: Right. If you're having encoding from a device that you want to be decoded within the browser, I advise you to do it in VP8. AUDIENCE: Thank you. JUSTIN UBERTI: Thank you all for coming. SAM DUTTON: Yeah, thank you. [APPLAUSE]
Info
Channel: Google Developers
Views: 578,419
Rating: undefined out of 5
Keywords: chrome, gdl, i-o
Id: p2HzZkd2A40
Channel Id: undefined
Length: 44min 17sec (2657 seconds)
Published: Sun May 19 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.