WebRTC How it Works and How it Breaks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm been clang I'm going to talk about WebRTC do a bit of introduction I'm from Atlanta Georgia just out of curiosity I've started asking this because I find it interesting how many people have seen me present before awesome okay so thanks for coming back I tend to do things a little different each time and I think this will be no exception but a little bit about me about my background I run a company called mojo lingo in Atlanta we do all kind of custom application development which is pretty cool because we get to see a lot of different projects we get to see voice used in ways that is sort of outside the norms outside the standard the other thing I do is I lead an open-source project called adhesion which is a ruby framework open-source ruby framework for building voice applications how many of you have heard of it usually all right I got work to do I haven't actually presented on it at Astrakhan in a couple years I think I'll be doing that next year but you know with the with the move of asterisk to being more of a media server and putting applications in in external frameworks like adhesion it's a real opportunity to put business logic and the functionality you want to build in a framework outside of asterisk but I'll leave that from the talk I think it's pretty cool but today I'm going to talk about WebRTC so we've been building mojo lingo we've been building WebRTC apps since 2013 so a couple years now and we've deployed it a bunch of times and we've seen it work really well in a bunch of cases and we've seen it break in some kind of interesting ways so when I presented this talk I was originally going to focus just on the parts that broke but after talking with David Duffy he he also wanted me to kind of go into a little of underneath WebRTC a little bit more about how it works at an architectural level and kind of how the pieces fit together so I'm going to do it again how many of you have actually tried using WebRTC cool and asked question different way how many of you tried building something with WebRTC okay so more less the same people very cool so for those of you who haven't what I want to do is I want to talk first a little bit about how it works so again how the pieces fit together we're not going to get to the level of code just more architectural mmm and we'll talk about how it breaks or how I've seen a break and then how we fixed it in the case we fixed it so at a very high level what is WebRTC for a lot of people WebRTC is synonymous with the JavaScript API so when you go build WebRTC application you start by looking at how do I interact with it from my code in the browser and JavaScript and the big thing that WebRTC gains us in the browser is access to the camera and the microphone prior to WebRTC we didn't have that access right we had to use flash or Java and it kind of sucked so WebRTC gives us that but it's not just the API right it's also very high quality audio and video codecs and patent free access to very high quality video and audio codecs and that's a game changer if you ever if you have worked with especially video in the past you might have come across the well h.264 licensing even h.265 has some pretty crazy licensing terms with things like opus and vp8 we have access to very high quality scalable video and audio codecs music quality audio and high definition video which is pretty cool WebRTC is also techniques for getting around napped I mean how many times have we had to go reconfigure asterisk because we had one-way audio or had to go set the external IP so that the SDP was set up correctly it's a giant pain right so WebRTC learned from that pain and has implemented a lot of things to make that better and I'll talk more about those techniques in a little bit web RTC is also peer-to-peer so whereas the typical sip deployment model looks at a server in the center to facilitate the communication WebRTC s primary approach is to create peer-to-peer connections so if I'm sitting over here and you're sitting over there we'll try to get our browsers to talk to each other directly and along with that peer-to-peer connection we have an optional data channel so one of the things I complain about when I talk about why I think the telephone is kind of a broken model is you really only get that one audio connection and any kind of signaling you send over that audio has to come in band in form of DTMF digits with WebRTC not only do we have high definition audio video we have this data channel that can ride alongside to directly send information between two browsers so an example of this is there are companies that have built WebRTC basically peer-to-peer file sharing and again the connection is directly between between the two browsers which is really cool and one thing I do want kind of want to harp on which I think is a bit of a misconception you see a lot of questions about how do I enable WebRTC and I think that's a misunderstanding WebRTC is really meant for developers now obviously developers will build things that end-users will ultimately enjoy but it's not like you can go download a WebRTC client and install it and run it you really have to build it now there are lots of things that make it easy and I don't mean to make this sound harder than it is but it's important to remember that this is aimed at developers who are going to be building something on top of it I also want to talk quickly about what WebRTC is not so as I mentioned it's not a polished end-user product it's also not required to interoperate it might but it's not required to it's there are lots of use cases so we're all familiar with the sort of telephone use case where I have my ten digit random phone number assigned to me and anyone in the world can reach me on it no matter what carrier they happen to be on that's in a rubber bit that's interoperability WebRTC can use that right we can Gateway to the telephone network but it's not required to be there are lots of use cases so say for example I want to go to an insurance company and I want to talk to an agent about an insurance plan that I'm not yet a customer but I want to talk to them all I really need to do is go to their site and their site delivers to me the entire WebRTC experience without me having to download anything other than the page itself obviously and without having to register with them necessarily right so it's it's scoped to that site and they don't need to interoperate they don't care about my Facebook or Twitter identity probably they don't care about my phone number at least till I want to sign up and WebRTC is not the same thing to every application sometimes it's going to be a conferencing service sometimes it's going to be a telephone gateway sometimes it's going to be a video chat in the context of learning or in the context of sales it's it's going to be different each time it gets deployed and that really is the value of it the last thing I'll say what WebRTC is not is finished the specification is still in draft it's very close and I'll talk a bit more about that in a minute but the big thing I want to say is even though it's not finished don't sweat it too much the standard is is really fairly stable over the last year or so and interoperability between browsers is good mobile devices are coming up it's it's it's ready if you're not looking at it already you definitely want to be so I want to talk for men also about communication topology and by that I mean that this sort of structures by which we set up calls this is what I think most of us are familiar with the the so-called trapezoid so in the trapezoid you have Alice and Bob they want to talk to each other well Alice happens being 18t subscriber and Bob's a Verizon subscriber they can still talk to each other right so that's because Verizon and AT&T federic they know how to reach each other they have they share the same address space which are telephone numbers and they know not to give out the same phone number to two different people that's that's a pretty standard deployment that we're all familiar with especially the term in in terms of like a sip deployment right you have your carrier upstream that routes calls out to whoever and it reaches where you're trying to reach but that's not the only way you can do it so Skype is a good example of a triangle if you imagine Skype in the center facilitating all the communication and managing all the addresses then the two-piece the two people the end Alice and Bob they only really have to coordinate with one server now and Skype doesn't have to federate with anybody Skype to Skype right so WebRTC kind of falls into that but it's what I like to call a more perfect triangle you still have the thing in the middle I'm a Ruby on Rails developer so I stuck their rails thing in the center right so you use rails 2 or whatever your service may be to coordinate the communication but the media is passed directly there you can force it to go through server if you need to but by default web are see WebRTC tries to send the media directly there are advantages to this for security there are advantages for low latency so there's less delay in the in the call yeah so that's that's the WebRTC triangle one of the other big advantages of this is that in a lot of cases if you end up putting a firewall through your signaling layer your media can still pass so the example for this is let's say we're here at Khanh and we all go to a website and start a WebRTC conversation we will end up sending media only on the local land that means that we don't have to worry about the firewall blocking things and we don't have to worry about saturating the link because we're only keeping the media on the network web RTC does a lot to optimize for that so next thing we'll talk about is what a typical web RTC app looks like in infrastructure how its deployed let's imagine we have two clients right we got to start with the people that actually want to talk to each other in this case we'll use Firefox as one client and a mobile app as another the first thing you have to have is some kind of signaling I'm going to use a very simple HTTP polling interface here and the idea is that if Firefox wants to reach the Samsung then it sends a request and the HTTP server only job is to send that request to the phone and vice versa right you got to have signaling that's what's that's what's going to introduce the two parties to each other and in a lot of cases if you're on a LAN you can kind of stop there that is the bare minimum requirement to have a web RTC call now of course there are things that might happen that make that a little bit more challenging like I don't know you get stuck behind a firewall so in a lot of cases WebRTC will be able to pass media across the firewall but not always some firewalls are very strict in that case we introduced another piece of infrastructure called a stun or turn server so these end up being the same service turn as kind of a super set of stun but their main job is to get the two parties introduced to each other and to find the most optimal media path between the two stun is about keeping track of which interface excuse me which IP addresses you can speak on and turn is about relaying media which I'll talk about more in a second so in my opinion this is the minimum deployment required to do anything on the internet if you're going to deploy WebRTC in Internet these are the two pieces you absolutely must have a some kind of signaling mechanism and some kind of stun or turn server I'll also mention I haven't yet talked about asterisk you'll know when I do the next thing is as you start to grow and especially when you get above say four or five video streams you start to eat up a lot of bandwidth so now you need something else something called an S fu or or an MCU and these are kind of two different approaches to solving the same problem which is I only want to send my video once but I want to receive video from all the parties the default web RTC topology is star everybody talks to everybody and that's a lot of bandwidth once you start getting a lot of connections so an S fu is a selective forwarding unit which it's a video bridge is a good example this B has a part of that and it basically optimizes how much video gets sent to each person and it's very lightweight an MCU is a bit more heavyweight an MCU is a multi-party control unit the idea is that you take multiple video streams and then composite them into a single frame as you can imagine that's like transcoding on steroids on something worse than that it's heavy it takes a lot of compute power especially as you start to scale now there are cases where you want one versus the other I'm not going to go too much into that because we could talk for a while on that but if you start getting into larger multi-party video conferences you'll probably need one of these pieces as well the last piece not the least piece by far is asterisk so the big the big limitation we have before asterisk is we don't have a good way to record audio or video we don't have a good way to manage conferencing we don't have a good way to connect the pieces together programmatically if we want to put some kind of application around the media so where asterisk comes in is it's really great at handling the recording if you get all of the WebRTC streams into asterisk if Astra's participates as a web RTC peer then you can do all the recording you can save it send it to users you can inject audio play files into the conferences or to the channels you can actually do conferencing audio or video although video conferencing is some limitations and of course the PSTN gateway so one of the most common WebRTC use cases is contact center right so if we do our queueing on asterisk then you can use asterisk as the gateway to the telephone network I want to talk a little more about web RTC signaling so at the top of the last slide you saw I had just a very simple web server web RTC signaling isn't actually specified by WebRTC there is no standard for signaling in WebRTC all it really tries to do is get Media flowing really it can be anything you like kind of the most common ones that I've seen are very basic HTTP literally I post it to a server and then somebody comes by later and gets it or maybe some kind of WebSockets where you can distribute it out obviously sip is very common and that's back that's why asterisk added support for WebSockets was to be able to handle sip over WebSockets so that browsers can use sip as their signaling layer to set up the WebRTC media another very common one is XMPP so you guys probably me with XMPP from chat or gtalk or jingle excellent PP can also be used to set up WebRTC and I kid you not you can actually do it with a carrier pigeon so I've actually heard of people who have put an SDP taking the SDP that sets up a web RTC call saved it to a text file stuck it on a flash drive walked to the next computer plugged it in copied and pasted it back in the browser and that will establish what our TC session so my point is if you need to be creative in your application to get different signaling you can do so if you need to do something centered like sip you can do so WebRTC doesn't care is very flexible that way so you should select it based on your application requirements some things to consider are you know should I integrate with something existing you know or is this Greenfield Drive can I can I do whatever I want and think about the people you have working on it as well are these telephony people are they web developers what are they familiar with do I want to federate do I not want to federate is this something I expect across organizational boundaries really just private to me and then do I care about identity how much do I care about identity is this an anonymous service do I use real names next thing I want to show is I want to kind of illustrate what it looks like to set up a web RTC session so we have Alice and Bob and we have our sort of standard generic signaling server at the top and they want to talk so Alice creates an SDP and this will look familiar if you've ever done a Wireshark trace and you look you know at this at the sip invite you'll see the invite headers and you'll see the SDP it's the same standard it's been extended to include support for extra WebRTC stuff but it's the same thing the web server is only job is to make sure that SDP gets to Bob it does that Bob generates his own SDP and passes it back to Alice and this is where the ice the stun and the turn kick in so I kind of touched on what ice was earlier I wanted to show you an example of what ice looks like if you go to there's a bunch of demo WebRTC apps one of them is this tool that will generate for you all of your ice candidates so an ice candidate is simply an IP address that you think you're reachable on and I realize that's kind of small so I tried to blow it up a little bit that's probably not much bigger sorry there are three main types you see you see a host entry which is an IP address physically on my computer and you might think you only have one but you usually have several obviously you'll have your wireless or your LAN IP but then VPN adapters often add one add additional ones or if you have some kind of VMware or VirtualBox those will also add additional lines so that's why that list looked so long because I have a lot of network interfaces the next one of these server reflects of SR flx so those are stun servers on the internet that I've sent out a request and said where am i coming from and it it presents a public IP here which is my effective public IP when I made the request so the stun server is telling me I see you coming from this IP address you can imagine in asterisk this is the equivalent of externa pbut it's auto-detected and last one are these relay candidates so if I can't make a connection via direct me if I can't get the two browsers to speak directly even after using local and observed IP addresses then this is a server who's willing to act on my behalf and that's actually the IP address of the server where it will exchange media for me so it's called a turn server so once we have the turn server in place the next thing that should happen is media starts to flow and one of the nice things about WebRTC is all the media is encrypted by default uses SRTP which means that our friends at the NSA who like chewy delicious unencrypted media have a bit of a harder time with it it's kind of nice so that's how it works hope you're all still with me on that maybe learned something from it the other thing I want to talk about is how it breaks so this is not an exhaustive list this is some of the more interesting failure modes we've seen in the real world certainly there are others but I thought these were some of the more interesting ones I never start with one that I don't think enough people talk about which is kind of too bad because it's probably the most common problem I see and there are environmental problems and by that I mean silly things like your speakers your volumes turned down I don't know how many times I've seen people fail to connect just because their microphone is muted or their speakers were muted or they just had the volume turned down so just check for it and there are actually some api's within the browsers that you can use to try to detect if that's the case it's not a hundred percent but if users have a problem that's the first thing we got to teach them to look for another one that's kind of a pain is is it too dark or too backlit and I wish I had a screenshot to show you this but a lot of video Commerce conferences end up with somebody who's essentially a silhouette either because the room is just dark or because there's a big bright ball of sunlight coming in behind their head another one it's kind of less obvious are hardware or driver issues so my favorite example of this was we had a client who was rolling out a WebRTC app and it was working fine all day long and all of a sudden it stopped working and they couldn't figure out why just no audio everything connected everything looked good but they weren't getting audio and what actually happened was the Windows driver kind of just stopped and so the sound card stopped and the only indication to the user was in the system tray that little volume the speaker icon had a little tiny red X is the only indication of a problem the the solution I gave them was don't use a USB headset right it's simple all the computers have built-in sound cards just get a standard 8-inch problem goes away I also want to point out a really cool sight test WebRTC org they have a really nice and fairly thorough self check kind of tool it'll go through it'll check your camera's it'll check your microphones it will actually check that you're getting audio it'll look for change in DB it'll look for black frames so if your camera's not working it will kind of detect that as well it'll check network connectivity to check bandwidth a whole bunch of neat things and I believe it's all open source so definitely check that out by the way I'm going to show some links but I'll also have them all collect on the last slide as well so don't don't worry if I skip past it so the second one I'll talk about are kind of just generically usability problems and the number one thing I see in this category is just failing to deploy a TLS or SSL certificate this is absolutely mandatory if you take nothing else away from this session please don't even consider rolling out WebRTC without an SSL certificate and I I feel like I shouldn't have to say that but I was I was working with a customer who I don't really know why they did this but they felt SSL was too hard to deploy so instead they went and they went to all of their agents workstations and they changed the link that launches the browser to disable the security check for WebRTC so okay that works as long as you always start the browser using that icon is great the problem is some people put URLs in their system startup bar or they clip the link in an email at launch the browser well guess what it didn't have the flag so every once in a while things would just stop working because Chrome would block the WebRTC media because that flag wasn't there I mean that was kind of silly right I don't I would most people won't get to that level but you will avoid a lot of problems just by having an SSL certificate on your server and you're signaling channels protected as well so it's just got to do it another one is not allowing the user to choose his device so this is this is kind of overlooked but actually one of my co-workers has a dual monitor setup and the monitor that's built in is over here sleeping the camera that's built into the monitors over here but the camera he actually uses is over here so when we were building an app one day we realized that we were always looking at the side of his face when he started and he didn't have a way to change the camera it's a subtle thing it won't affect all users but just be considerate some people really do need to change devices whether it's audio or or video another one that was kind of funny was um we had an app where it was it was this long form and the agent needed to be able to scroll and down the form while speaking to the person on the other end there was video in the frame and occasionally and what was supposed to happen is as the agent scroll the video was kind of pinned to the top of the screen and then so as I scrolled the the form would scroll but the video would stay at the top well the problem was that just through some kind of bug if the agent scrolled down before beginning the video session the video is pinned to the top of the page out of view so we would get bug reports that WebRTC wasn't working when actually all they had to do is kind of scroll up to the top and scroll back down so silly bugs like that witch phone usability that just comes from testing right we solved that one very easily once we can see what it was but the bug reports we were getting were it doesn't work Watson kind of falls in this category is it's it's surprisingly easy to end up having the video or audio attributes set to paused and I believe I can true if that's still the case but I believe for a while the video element actually defaulted to paused so one thing to check is if you especially the video is not rendering or frozen just see if the element is paused and third we'll talk about our browser problems and the main thing here is that the specification is still evolving so good news is one point I was on the horizon it's very close like I said things are have largely settled down and it's much better much better than was even a year ago most of our apps don't really fall apart anymore I'll point out there is still no native support for IE or Safari but I've got more on that in a second talking about SSL again and the spec evolving there's been a bunch of noise recently because chrome is actually dropping support for WebRTC over non-encrypted sessions so previously it was a hassle because every time you the user would be requested to reauthorize audio and video now it's just not going to work so again certificates and if you're interested in kind of tracking some of the changes that are going on there's a really great site run by Dan Burnett who's the editor of Weber one of the editors of the WebRTC spec and Aamir's Mora WebRTC standards that info they've got a blog they've got a mailing list there it's really good detailed here's why here's what's changing in the spec and why highly recommended if it's if this is your day job and I'll finish the slide with a without quote as we were testing Chrome you know when you're adding the kind of complexity to Chrome like echo cancellation and try and you know media encoding there are bugs and occasionally chrome you know will stop working sweet we invented this acronym for WebRTC well everybody better restart their Chrome it doesn't do it too much anymore but if in doubt try it one other thing I want to mention to solve browser and compatibility is using something called a polyfill so the WebRTC Weber to C org they one of the big organizations that kind of promotes WebRTC has this really great library called adapter Jas and its job is to smooth over the inconsistencies between Firefox and Chrome and it just basically you can write to that one standard which is which basically follows the WebRTC standard and if there are any subtle differences and the implementation of Firefox or Chrome it will cover those over and it's kept up to date which means you don't have to do the work of figuring those things out and writing the code yourself it's a it's a very small library I would not I cannot imagine deploying writing anything WebRTC without this or something like it that's on github as well now I mentioned Internet Explorer and Safari the good news is we have something for that there's a company in Singapore called Tim Isis that has released a browser plugin yes it's plugin yes you have to download it but if you do you get real native WebRTC support in Internet Explorer and Safari it is not open source it is free for non-commercial uses if you want to use it in a commercial setting you got to pay them but it's a really nice middle ground to be able to write to one standard and I promise you it beats trying to make flash or Java fallbacks especially you don't have the infrastructure to do it and kind of the last thing in browsers incompatibility is a lot of these problems are solved for you if you just pick a WebRTC platform I was kind of scratching my head trying to figure out who I could get to be a good WebRTC vendor oh yeah our spoke so we spoke and there are there others kind of like them that that have done all this work they have the adapter built-in they have the infrastructure deployed so if you're looking for something much simpler to just you know kick it out the door launch it with with minimum effort something like we're spoke is a great option for that that they've done all that hard work for you and they'll keep it up to date as the browser's not naturally evolved so kind of sum it up WebRTC problems solutions first ins are environmental if you can change the environment and if you can't then teach the user so they can change the environment usability problems please to play SSL Certificates oh I should have mentioned it let's encrypt who should hear heard of let's encrypt oh I've got something for you all only two people I'm surprised so last question ever way how many people find SSL Certificates to be kind of a pain that few of you really not everyone okay fair enough so let's say I find us in social takes to be a pain they're expensive I don't really know why I have to go through this kind of crazy process I have to go to my you know left foot if I buy my domain name I have to go to a separate place to get myself certificate or at least a separate process there's a public was a corporation in public interest called let's encrypt org and they're not quite launched yet but it will be any day now they will be offering free SSL certificates to anyone who asks they have an automated approval process highly highly recommended check it out let's encrypt dot org and I apologize I didn't put on the slide so I didn't think about it till just now but let's encrypt that org is the big deal the other thing on usability problems test test again and test some more I can't tell you the number of times I've been called in to look at a bug with a client and the biggest problem was the developer simply didn't go stand in the queue with the user who is having the problem I can solve so many problems that way just just by observation so test yourself and then get with the users whenever you can sometimes you can but whenever you can get with someone who's not familiar with the tool sit with them see how they use it and see what doesn't work then finally for browser compatibility infrastructure problems just lean on somebody's fixed it this the best thing I can say about that either use open source like adapter j/s or go with something like or spoke that will hand all the infrastructure for you as well so that is it I've got all of the links that I mentioned if you want to learn more about adherence there at the bottom mojo lingo and I'm Ben feet clang on Twitter and github and most other places and with that I'd love to take any questions yes that's a good question let me I'll repeat it so the question was device selection you think it was it was part of the browser and it is right it's in that when you look at the web RTC permissions you can you can choose the audio device associated and is that not sufficient why should I do it in JavaScript look at that alright so you don't have to you certainly could lean on the browser to do it I think a lot of web developers find that the more of the user experience they can control the better if there is a control for selecting devices and it makes sense within the application if if I'm in my application I go to the settings cog to to change my profile I might also want to change my devices there potentially you could also store preferences based on the site so site a might use this microphone site battlin why you would need that but I think just really comes down to control I want I want to control my user experience as much as possible yes are the codecs supported by asterisk for WebRTC it's matt in the room okay um I'll start with audio a little bit simpler so one there there two mandatory to implement codecs in WebRTC which means everybody who implements WebRTC should implement them one of them is you law no problem right we all you laws yeah we don't like it but everybody's got it the other is opus the problem with opus is that there are companies who assert they own intellectual property that controls opus so and I don't blame Digium it's it's a tough spot to be in especially we know they're they're kind of a juicy target to for lawsuits but they have they have decided that it's not in their interest to include opus with asterisk natively so opus is not supported out of the box by asterisk that being said there are projects on github which will add the codec to asterisk you have to compile it but they are available on the video side I know asterisk doesn't do any video transcoding so it's a little I don't say moot but it's it's not nearly as significant as is on the audio side like when your gate wing I know vp8 is supported I don't recall up to 64 then okay 262 are not supported anyway if you need if you need to go more than just kind of connecting channels in asterisk you might be looking at some other technology to handle the transcoding and least on the video side so the question is can I separate the audio and video such that the audio goes through asterisks what the video goes peer-to-peer yes I wouldn't recommend it because you can no longer control the synchronization so you may end up in a case where the audio is in one place and the video is lagging behind which just be weird if you're just talking about pass-through Astra's should have no problem with that if you yeah it's only when you need to transcode or get to sort of more fancy operation but in the contact center application talk about I would use asterisk I don't worry about it yes yes I mean you can use self-signed certificates and there ways to adjust trusts no dent says no no you can't use them okay okay so then just repeat dan said you cannot use self-signed starts it must be a trusted host excuse me trust certificate from a trusted origin such as localhost or or trusted so if you want to use a self-signed cert then you have to add it to the trust store which is significantly more effort not recommended again let's encrypt org hopefully we'll solve that problem for us yes okay I'll repeat the question the question is using RC 5760 externs server and je s sip with asterisk you've got your RTP comp set up to use the standard or whatever ports I think the standards 10 to 20,000 but whatever you have them set to and it seems like WebRTC is ignoring those or least asterisk on WebRTC channels is ignoring those I am not familiar with that that sounds like an interesting bug report and I would I would suggest filing it they'll either give you an answer why or fix it any other questions no all right well I'm one minute over time so thank you all very much
Info
Channel: Official Asterisk YouTube Channel
Views: 38,441
Rating: undefined out of 5
Keywords: webrtc, Real Time Messaging Protocol (Internet Protocol), asterisk, astricon, Open Source (Software License)
Id: 3TbVi9aB09k
Channel Id: undefined
Length: 34min 20sec (2060 seconds)
Published: Thu Dec 10 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.