AllThingsRTC 2019 - From WebRTC to RTMP - Bridging the Broadcast Gap

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi as the previous slide said I'm Annette Chadwick I'm software engineer at MUX and for about the past ten years now I've been working in some way shape or form with RTMP and live video over the Internet today I'm gonna give a talk targeted at two different audiences kind of at the same time so I apologize if I'm retreading any ground that you've already heard before I'm trying to target this talk at WebRTC engineers hopefully like yourselves who want to learn a little bit more about how you can integrate WebRTC flows with traditional broadcasting workflows I'm going to go over what a traditional broadcasting traditional I don't mean going over a satellite truck and over a TV I mean over the Internet but kind of the workflows that sites like twitch or Facebook or YouTube are using right now to deliver great live video experiences over the Internet I'm also targeting this talk at live engineers working in that model who want to learn a little bit more about integrating with WebRTC and why you should learn a little bit more I think there was a great note at the keynote this morning about how more and more WebRTC workflows and live broadcast workflows are coming together and being the same thing the experiences that we're trying to develop are the same and today I'm trying to give a talk to try and help us bridge that gap okay so I'm gonna go over how RTMP and HLS live streaming works just to make sure you have a therm reference of the model I'm gonna go over very briefly and very lightly the WebRTC modeled and I'm not gonna go into much depth here I'm not gonna talk about things like selective forwarding units I'm just going to go over the key pieces of information that you need to know if you're wanting to bridge that gap to a traditional live workflow I'm then gonna talk about why we really need to start bringing these these models together I got the slide slightly wrong and then finally I'm gonna give you a practical approach with hopefully a live demo that works of using WebRTC in a live broadcast workflow so let's start with what the current state of the art is for going live over the Internet and there's two key protocols RTMP and HLS here's a diagram that describes how we currently go live you'll have some source of live video be it a camera a mobile phone or perhaps even someone playing video games at home broadcasting to something like twitch that raw source of video is then gonna get encoded into rtmp that's the protocol that's going over the wire and inside rtmp you're going to be using codecs like h.264 and AAC you're then going to send that stream up to the cloud where a cloud transcoder is going to create multiple different renditions of your content targeting different resolutions and bit rates for different users it then gets packaged and I'm talking about HLS primarily today but you might also have heard of dash which is another segmented streaming format that's very popular finally your your cloud encoding solution is going to use some kind of origin serving before sending it out to a CDN it's very rare that your technology solutions are gonna get to the scale where you can run your own CDN so it's best to leverage one that already exists your CDN is then gonna cache or your video and allow it to be served out to the various people who want to be there young TVs laptops or phones let's dive in to how HLS actually works and like I'm saying a lot of these concepts apply to - I'll be it with a slightly different format now a trellis came about really because of the iPhone the the old workflow was you would create a version of your video you would put it on the internet and people would download it that had some limitations when iPhones were coming out connecting to things like 2g or 3G connections or Wi-Fi where the quality of the connection that the device had could vary quite a lot between how you were using it and specifically where you were using it and that drove the creation of this segmented model originally the way it works is you advertise a master manifest of your content what this master manifest describes is the various different ways you're going to make your content available at what resolution you're going to make it available what bitrate with what codecs and each of those descriptions then has its own manifest describing specific segments of the video that you're going to make available for download now the most important part about this technology is that each of those segments across the different renditions represents exactly the same slice in time we'll get to why that's important in in just a second so let's dive into what a master manifest looks like with HLS you can see here that there's some version headers there's a note that we're using independent segments and then in this is a trivial trivial example I've got three different renditions I've emitted some of the more esoteric details to give you the three key details that each of these renditions has it has a resolution it has a bandwidth oh no I messed this up they're meant to have different bandwidths my bad and frame rates and you can imagine that the 1080p might be 60 fps and the 360p rendition there might be 15 fps that would still be entirely valid if your presenter had done a better job of this slide okay moving on to each of the renditions manifests you might download save renditions one and rendition two and you might be surprised to learn that they look almost exactly the same now this won't always be true if you're using non relative URLs you can see here that the 60s represents a relative URL to the rendition manifest these two files would be different if we were using the full path to the segment but what's important here as I mentioned is that each of these segments represents exactly the same amount of time as shown by the ext here even though the exact duration of each segment might vary what's important for the spec is that the timeframe matches across renditions now you might be thinking hang on a second but you haven't talked about live yet well the way that live video is expressed in this segmented world is through time having our rendition manifests change each run different dish and manifest in a live HLS broadcast represents a slice of time and several segments from that slice of time and it's a sliding window you might advertise that you have a target duration of 5 seconds and several segments in your manifest then 5 seconds later if you download that same manifest you can see that our 60s has slid out of our window but 10 dot TS has now been added in that sense this file represents a sliding window of time and each of the segment's represents one slice of time that you can download and play this way as long as you keep updating your rendition and downloading the most recent segments in that rendition you can play a live stream now there are some key features that it's important to understand about why this format came to be and why it is the way it is now the first thing is that by having different renditions of our content very similar to similar in WebRTC you can support viewers with differing network conditions if I'm on a poor 3G connection I might be only be able to handle a 500 kilobits per second whereas if on my desktop I can handle a full 10 megabits and what this format allows us to do is deliver the best quality that your device can currently handle but then within a single device within a single user session doing this this way enables clients to adapt to changing network conditions you might imagine that you're wandering down the street and connecting to various different 4G access points with there as different levels of congestion which greatly impacts your devices available bit rate and by advertising chunks that are the same representations of time we can allow our client to dynamically decide whether or not it wants to download a good-looking chunk because it has lots of bandwidth or a small trunk because it's bandwidth is currently limited now this requires us to have smart clients your client needs to be constantly testing how much bandwidth it has available to it so it can make good switching decisions between these renditions but crucially this technology worked with dumb CDN service the only thing you needed to get right in order to make HLS live-streaming work was having a short time to live on your rendition manifest as long as you could make sure that your CDN was updating your rendition manifest every few seconds you could just use the technologies that they already have it didn't require complex RTSP or RTP servers or nowadays complex WebRTC servers in order to do live streaming you could use a straightforward CDN with distributed points of presence across the globe with well-known pricing okay then the input format was rtmp an rtmp is pretty much a mistake and if you're wondering okay rtmp wasn't that flash thing it was kind of like a predecessor to WebRTC you're not wrong and it kind of lives on in a zombie format it's used by places like twitch or Facebook or YouTube to go live and I'm just gonna go over some of the key features of rtmp right now it is a TCP based protocol it's supported by directional communications it supported multi-screen communications you could in fact in 2005 open your Flash players on two different sites and have a conversation with a friend in another country that didn't work very well because it was TCP but it did sometimes work and it also included an RPC mechanism and then time happened and now the actual features are no we bi-directional communications we in fact only ever use a single direction to broadcast video over rtmp multi-stream communications nah we only ever do - we only ever seen send a single stream over rtmp want with a single video stream and a single audio stream there's there's no backup second language here no no and there's a not BC maxes so that nobody really uses it because RPC has been too specific so what actually happened is rtmp ossified into the protocol we know and love today and there's a list of audio codecs that supports you might note that opus isn't on it there's a list of video codecs that it supports and you might also note that it supports cutting-edge technology like vp6 or h.263 and that very experimental AV ceaseless h.264 and it doesn't support vp8 vp9 a v1 h TVC or anything else ok so that's what we use now let's do a quick overview of the WebRTC model and I know I'm giving a talk about the WebRTC model to a roomful of WebRTC engineers I'm sorry I'm really really sorry if I get it wrong a little bit I'm just trying to get the high notes ok so it's a Oh free open project and its mission is to enable rich high-quality RTC applications and it's supported by some big names and it is video chat built right into your browser yes but it's also tools in an ecosystem for building experiences that leverage low latency audio video communications and it's built using the best of what existed and there's there's two separate tracks you need to be aware of when you're talking about WebRTC the first is that there's the w3 standard for JavaScript and when we're building web applications this is what the model that we need to care about but when we're building live video broadcast systems what we really need to care about is our TC web which is the IETF working group for the low level protocol they do things like picking RTP files and specifying exactly how we're gonna express audio and video to to our other parties okay so WebRTC solves a fundamental challenge of being peer-to-peer now what is this challenge you might ask well a traditional if you go to google.com at home you are most likely going to you're your network is most likely going to look something like this you're gonna have your computer behind your home router which has a connection to the internet when you make a request to Google com after you've done the DNS I'm talking about the TCP packet that you're gonna send you're gonna say I know that Google is at 200 1.1 to 2.1 but five and I would like to talk to it on port 80 and make a request and when you send that TCP packet you say oh and this is the sender it's me but note that this is a one 92168 address that's a local address the internet can't talk to it what your router does is sets up a mapping between your internal IP and port combination and the connection that you're trying to make out to Google and it writes in to the source it's actual public IP address then when Google responds it says okay well I'm I'm Google this I'm sending this packet and I'm sending it to this public IP address your router then looks up in its mapping which local connection that that packet should be sent to rewrites that TCP header and says okay here you go now this works if you are connecting out but it poses a fundamental challenge if we're trying to talk to each other and that challenge is solved by what's called ice now let me try and explain briefly with a diagram I'm sorry I forgot where I saw this diagram for if you're the author of this thank you what happens is we're both behind this network address translator and that means that if I knew your IP address I could send a packet to your router but your router wouldn't know what to do with it because that mapping hasn't been set up for your connection yet certainly not to me and likewise you can't send a packet to me we could maybe go in to our routers and set up port forwarding like you had to do to get some old video games working back in or nope um but if we have the ability to use some kind of reliable third-party signaling server we can what we can do is use the stun protocol to find our public IP addresses then exchange those public IP addresses with each other and then we can both send a packet to each other one of them probably won't get through but it will be enough to establish that mapping in our routers and then if we keep trying eventually we're going to get a connection through and the way this is all done is through ice and okay the way this is all done through is through ice and that's why when you're looking at WebRTC messages you need both a reliable signaling mechanism and you're gonna see what's called ice candidates flowing over the wire but you also need some way of saying hey I would like to talk video with you and and audio and to do that SDP was selected that's how we describe the media that we'd like to exchange with each other now there are experts in the room on this so if you have any questions please ask them not me but the very quick overview I'm gonna give you is it is defined by an RFC that existed long before WebRTC did and web FCC picked this because it's proven and it works and lets you integrate with legacy systems and it looks a little something like this which if you're a web engineer of any kind you're gonna be like it's not yeah no it's not Jason what is it it's not even a protobuf no it's a custom protocol and there's a lot of stuff in here which I am NOT going to go over at all I'm just gonna pick one of the really interesting features which is we can describe the audio and video that we would like to send but we can also describe the capabilities we have to send it and express a preference for which of those capabilities we would like the other side to use and the mechanism that we use is an author answer we can say hey I'd like to send you audio and video I can do opus and some other codecs I can do vp8 and vp9 and h.264 and I would like to use these codecs and if you can't do that maybe fall that to this maybe fall back to that okay then the other side can say sending us are saying okay let's do this here's what I support and here's what I'd like to do then when you actually start sending RTP packets over your ice connection both sides have negotiated what the media IDs in those packets actually represent in terms of codecs okay so I think I'm over that so now I'm going to talk about why our two worlds are on a collision course if it wasn't already obvious to you more and more RTC use cases are involving broadcasting and that was talked about this morning that kind of experience where it isn't just me talking to one other person it's often me talking to a friend talking to another friend broadcasting our conversation live watching something else having an audience write text chat back to us there are all kinds of exciting use cases coming out here and also at the same time the traditional live broadcast HLS - world is focusing right now on how we can improve our latency we're also trying to build these rich interactive experiences on top of the pre-existing codecs just last week Apple at WWDC announced their new addition to their HLS spec called low latency HTTP Live Streaming we're on a collision course and as an industry we're starting to it hit some really interesting challenges like I was saying we have these cool new codecs coming out but also we have these cool new protocols coming out this is SRT I've grabbed a quick a screen shot of part of the the SRT raison d'ĂȘtre you can see here that SRT SRT sorry it was created by high vision and wowza to power new better live-streaming experiences to replace the rtmp part of our sack with something new and better and you see some of the challenges they're thinking about they're thinking about jitter packet loss bandwidth fluctuations they're thinking about how they can make it low latency how can they secure it by now you should be thinking hang on a second I've seen this list of requirements before right this SRT is solving almost exactly the same problem that WebRTC is solving can we just all be friends well we can but there are some challenges the biggest challenge as I see it is that fundamentally the rtmp model is a point and go model you open a TCP connection you send some pretty well-defined packets saying I would like to connect I would like to publish a stream and everybody's kind of standardized on here's a stream key that uniquely identifies me and authorizes me to broadcast to this endpoint and then you just start sending video over your TCP connection and you're done it's pretty straightforward to work with this you can set up load balancers we kind of know what to do with this with WebRTC though there is a negotiation and if you recall that diagram with the little cloud at the top that cloud is empty because it is undefined there is no well-defined mechanism of how you actually start a WebRTC connection once you have it it's all well defined but how to start it is undefined this is a challenge effectively WebSockets are a de-facto standard for this at least when you're using web web browsers but that's still like what is the message you're sending how do you identify yourself how do you authorize yourself what is the packet structure that you're going to be sending you need to send an SDP and you need to be able to exchange so you need to send an SDP offer answer you need to send ice candidates but how you do so is undefined this is also a problem there's no way to put opus in rtmp if you were thinking okay I'll just you know build some kind of adapter well the adapter you build needs to be able to transmit audio at the very and if you're using something like VPA which is mandatory to support it in the web browser you can't just put it in rtmp you need to transcode it into rtmp there are also some other challenges around some of the complexities of WebRTC things we've been hearing about today at earlier talks talk about how the resolution that we're streaming at can change dynamically the codecs that we're using can change dynamically we can flip cameras around we're not just dealing with an input of a fixed resolution we're dealing with a very dynamic input what if someone turns off their camera midway through your live stream what do you do can how do you build applications around that this is gonna be a challenge and the integrations we're gonna build are going to be difficult now to give you some inspiration I'm gonna go over two demos today of actually doing this integration although I have to admit up front one of them doesn't work so let me talk to you briefly about the first demo that I built it totally works on my machine but I couldn't get it working in the cloud I built a demo that uses called WebRTC rebroadcast err it's designed to be the minimum surface area you can do to go live on the site like twitch using rtmp but have your input be WebRTC and it exists as a transcoding rebroadcast err okay I built it using lib WebRTC today to power the WebRTC workflow Liberty C today is essentially what you get if you do the right magic on the Google official chromium WebRTC project and it's nice to use this project or as nice they're my thinking for using this project was hey it's the one everyone's using Chrome's using it I think Mozilla is using it pretty sure edge is now using it it seems like the default choice and I should probably use it working with the official WebRTC source is awful I'm just if if you ever get to this point where you're like I want to do some web RTC I'm sorry it is challenging it's like a 10 gigabyte git clone and I was like frantically trying to build a docker image and like trying to use conference Wi-Fi and then my remote machine that I'm using for my demos today didn't have enough disk space to even unpack everything I use boost up beasts for WebSockets as the as I said it's the kind of default I use a faff MPEG for converting the frames and a very simple Python server to serve up some some static HTML do a connection and and be the WebSocket server in the middle now I promise I'm gonna get this working you can take a look at the code here I just it's under a muck sink WebRTC rebroadcast err it uses see make I've checked in a live web RTC dot a and a JSON CPU today to ease your integration and if you want to check out the code you can see things like the transcoder dot cpp which oops pick the wrong one transcoded @h which implements a video sync interface and an audio track sync interface from lib WebRTC and I've tried to create a docker file which totally worked on my machine once but just needs a tiny little bit of love so if you're interested in in trying this project out please check it out and I'm gonna get it working soon okay so this was meant to be a demo of a very very basic workflow and then I was like let's go crazy with something that probably won't work and this is why I was kind of throwing lib WebRTC under the bus because the next demo I'm gonna do is for a more complicated workflow okay let's imagine you want to support two people from their phones having a conversation and maybe there's some viewers who can chat with them send them little little messages or lovehearts periscope style you're like great then you run into this question is like well whose screen should I broadcast like I don't want a broadcast ire of these screens I wanted to do something more complicated I want to broadcast something like this I want to have both of them get equal waiting and maybe there's some intro when they come on maybe there's a ticker down the bottom let's go a little crazy I mean you're thinking okay well I can start with little Bubba ROTC and I can build some kind of compositing engine and I'm gonna have to do like layered effects maybe I can pull something off the shelf ticker is in remote data integration yeah that's gonna be a challenge wouldn't it be great if I could just use Chrome all right let's try another live demo of using Chrome as your server-side rendering engine now this one totally worked earlier so I'm please bear with me I had the picture of da vinci's on earth up to there because that thing was never gonna fly but we're all really impressed with him giving it a shot okay like looking up my workflow here you're still in my Twitter stream key please don't please don't okay so when a login is Alice here don't don't go to this URL please I beg you I'm gonna try it try this on my phone so I have a second camera okay can't even connect I'm gonna start a call to myself over conference Wi-Fi okay cool so now I'm talking to myself this is great I've got my phone all you beautiful people out there in the audience give me a wave cool you've probably all seen this WebRTC demo before and here it is live on Twitch Wow the demo works cool and here we have chrome being our server-side rendering engine you can see that we've got some crazy effects coming on there's transparency there there's just the sticker down the bottom which is is all static and if you're interested in exploring this technique for building rich engaging live streams a little bit more firstly don't use any of my code it's awful but also if you want to check out this technique you can check it out here it's open source and you can see that there's a simple entry point here running some stuff pulseaudio for audio XB FB for off-screen rendering and i'll go back to my presentation now this and this is a screen shot in case that damn it didn't work and yeah this is a really fun technique where you can kind of use and abuse Chrome as your server side component of your live streaming web RTC infrastructure because it has a lot of the things you're already gonna need in order to go live and like I said this code is available packaged up in a docker file that does work and ready for you to have some fun with okay so the zoom sauce can head on over to github muck sink my employer was very gracious enough to let me open source these two demos and thank you so much for your time [Music] [Applause] [Music] you [Music]
Info
Channel: Agora
Views: 4,908
Rating: undefined out of 5
Keywords: webrtc, rtmp, allthingsrtc, allthingsrtc19, bridging webrtc into rtmp
Id: ZlQfWs_XTvc
Channel Id: undefined
Length: 30min 4sec (1804 seconds)
Published: Fri Jul 05 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.