How to Build an End-to-End WebRTC Service (Kranky Geek WebRTC Brazil 2016)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] DANIEL PETERSSON: So to get things right, this is not rocket science. It's kind of simple. So it's mainly an application of KISS. So let's keep all of this as simple as possible. So before we get started, we've seen the diagrams. But what is signaling, really? So signaling is the offer answer exchange process that is needed to set up a peer-to-peer connection between two participants. I normally think of this as the handshake. How do we agree on the connectivity pieces? So another sequence diagram. So we have four pieces. We have Alice and Bob, their application sites. And Alice's WebRTC library instance and Bob's WebRTC library instance. Alice starts all of this off by calling create offer. That gets her an offer, which is sent across to Bob by any means. In this case, we just call it carrier pigeon. Bob uses that offer to call create answer. Sends that back to Alice, who can complete the handshake by calling set remote description. Once that happened, we have a peer-to-peer connection created between Alice and Bob. Through the magic of [INAUDIBLE] that has been mentioned before. So the title of the talk is fast, reliable, and scalable. What does that really mean? So fast means we want to minimize the call setup latency. Reliable means we want to have really reliable call setup, so we have as few failures as possible, both for the users, but also for the developers. You get that wrong, you're going to have pissed off developers that tries to use your system. And the final thing is you want to have a system that actually survives the stampede of your own success. So what's the impact of doing this? As I said, I've been on a bunch of products. And in our experience, we managed to reduce the call setup time from about 10 to 2 seconds. We managed to get significantly more reliable signaling, or call setups. The system is a lot simpler to build and run than what we used to have before. And it turns out, it's pretty cheap to run as well. So let's replace the carrier pigeon with some details. We have some more components to look at in this system. We have HTTP servers and a database. And we have the same use case as before. We want to set up a peer-to-peer connection between Alice and Bob. And just as before, Alice starts off by calling create offer. She then sends it to Bob using a post, HTTP post operation. Bob is, sort of by magic, already polling the system, trying to receive that offer. So that once it's inserted into the database, Bob is going to receive it. As soon as Bob receives it, he can use it, just as before, to call create answer on WebRTC, send it back to Alice using post. Alice, as soon as she did the post, she entered her waiting loop. So she keeps polling, waiting for the answer to come back. As soon as insert completes, then Alice is going to receive the answer. She can call set remote description and the peer-to-peer connection is going to be created. It looks good, right? Not really. So if you build this, and you try to run it at scale, you're going to detect that it's slow. It's unreliable. And it doesn't scale at all. So as always, the devil is in the details. So what are the problems with the previous solution? So polling is pretty crappy. It adds overhead, both on the server side and the database side. It also adds a lot of latency. Using polling and HTTP adds sort of double latency because you need to pay for the poll cycle. And you need to pay for the TLS handshake for every single poll, which is going to be crazy. Used as a single database also have a set of problems. It's going to add latency because that database needs to live somewhere globally. It's going to have replication problems because it's just one small database. And it for sure, won't scale. There are some other details to this. There isn't any delivery ACKs. If you don't have a delivery ACKs in the system, you actually don't know if the message made it across. That's going to make developers a lot happier if they know. And beyond that, we don't have any new rendezvous system. So as I said, we rely on magic for Bob to know when to start polling. So it just doesn't work. So the details that we sort of need to go through and design. We need to build a rendezvous system. We need to look at the semantics of message delivery, so we get those right. We need to figure out how we do system shorting or petitioning, so we get the properties out of the system we want. We need to have a better plan of protocol. HTTP get and post, they are great. But if you actually want to have performance and you want to have performance on mobile devices, you need something else. And finally, but probably the most important from a developer perspective, let's make sure the system can handle retries and idempotency. Otherwise, it's going to be [INAUDIBLE]. So rendezvous. It's kind of a fancy name for a really simple question. How do Bob know that Alice is calling him? So we have two main options. We have GCM and APNs. Or we could build our own system. GCM, or Firebase Cloud Messaging or Google Cloud Messaging-- same thing, different names-- is really push notifications for Android. It's reliable in quotes. And it has a reasonable low latency. So the normal latency is about 350 [INAUDIBLE], assuming you can actually reach the other device. If it's in a drawer somewhere without battery, it's going to take longer. And I say reliable within quotes because there are corner cases where they drop messages silently. Especially if you run out of-- if you run out of capacity on the server to store more messages for that endpoint. APNs. So this is Apple Push Notification service. It's built-in to all iOS devices. It's best effort by default. So it means that we're going to drop messages, both client side and server side. And I'm fine with that, at least they deliver what they promise and not the GCM approach. They have better latency most of the time than GCM. So about 250 [INAUDIBLE]. It sort of depends. But you can count on it. We have the third option. You could always go off and say, hey, I want to build my own thing. You could use a long-lived TCP connection from the device to the cloud, handle that. You can make it as reliable as you want to. You can make it really, really low latency. Turns out it's scary hard to implement, especially at scale. It will most likely drain the battery rather than make the user happier. And on iOS, it's almost impossible to actually implement because of the constraints put on background processes on iOS. So message delivery. Semantics, but really important. What does delivered really mean? Right So a message is delivered. We define it as it's been delivered as it has been received by the destination device and handled by the application successfully. So what would that mean in our example? That means that the offer from Alice has been received by Bob. Bob have been able to apply it using create answer. And that operation succeeded. First, when we reach that point, we mark that message as delivered. Everything else is just lost or random. We have some other details in here. We need to decide how strict we're going to be with message delivery. There are really two big options. We can say in order or any order. In order means that we promise to deliver all messages in exactly the same order as they were sent. The other option is to say, let's be a little more relaxed and say, we deliver them most of the time in the same order as they were sent. But sometimes, we get it wrong. The reason for the "sometimes" is when you scale out, you're going to realize that servers go up and down at the worst possible time. When that happens, you need to retry. And that tends to rearrange stuff on the wire. The second thing, which is also sort of semantical but critical, you can say that we're going to deliver every message exactly once or at least once. So exactly once means one message was sent. We delivered exactly that one. And the other one is yeah, sometimes we deliver more than once. From an inherent perspective, it feels kind of nice to say, ah, let's provide in order and exactly once, right? We only have four options. That's the one that sounds really simple to use. But when you apply latency and complexity dimensions, you're going to realize it's slow and it's scary hard to build. So the Google ways is, let's make it simple and let's make it fast. We need to handle the problems client side. But still, it turns out that works really well. System sharding. We need to partition the system we're building. We also know by experience that most calls are made within a single region. You only call your friends. Most of the friends are in the region that you are-- most of the time. If we wouldn't do this, then every single request that leaves one region has to go to another for no good reason at all is going to add another 250 milliseconds round-trip time to every call. So if we apply this to the initial sort of polling one, that would mean that the lowest possible latency would be 250 [INAUDIBLE] because that's the cost of round trip. Even if you got extremely lucky. If you got unlucky, it would be worse. So the thing we realized from this is we need to store user data in the region where the user lives. And we need to avoid cross-region calls if at all possible. The last word is critical, right? If at all possible. So we know we need to shard the system. We know we need to shard it by users. So how do we do it? We can be very lucky and lazy as we were when doing [INAUDIBLE] and say, hey, this is phone number-based. Phone numbers are, by default, sort of regionalized. We can use that one. It works. OK. There are other options. A really simple one is to rely on a registration system and DNS. So you rely on DNS to find the server that is closest to the user and that server to actually know which region it belongs in. So it's the simple example. So we have Alice again. We have a global user database. We have a DNS system. It happens to be the one Google provides, but you can actually use any one. Then, you have servers in each of the regions. So when Alice want to do a registration, the first thing she does is she's going to do a DNS query for some DNS. If you configure DNS correctly, that's going to end up resolving to the closest use-- the server that is closest to the user. That server knows which region it's in. All the cloud providers, you can just query the metadata for instance and it's going to return the region. When we update the user record in the global database, we can say Alice is homed in EU. Pretty simple. The drawback of this is if Alice is traveling to the US the first time she uses the application, she ends up in the US and she's never moved. She is going to have a bad day. So you need to add some kind of user migration screen if you do it like this. OK. So a few times here, I've said local storage or global storage. So what does that really mean? So local storage is really some kind of storage. It could be MySQL database running within a single region. Low replication factor. But we can use it at scratch-- as scratch. That means we're going to have super-fast writes and we're going to have super-fast reads. Global storage is the thing we want to avoid. Sadly, as you can probably tell, you can't. It's globally replicated. It's very cacheable, because we don't change it very often. Reads can be fast because we can't cache everything. But writes are going to be glacially slow. That's just a fact. So don't do it often, but you can live with it. OK. Client server protocol. As I said, we know that HTTP polling is expensive and slow. TLS overhead, overhead of [INAUDIBLE] doing an extra call, or to do round trips, especially on mobile networks we risk hot-spotting databases because we keep reading the same entry over and over and over again. It's not very good. So we sort of have this wish list for mobile devices on how should the client server protocol work. We want it to be-- to support server to client push. So we don't need to do the polling. We just need to do one TLS handshake. And then when the server knows something, it can tell us about it. We need it to be really cheap on the wire. Sure, we're going to set up a mobile call soon. That's going to burn a lot of data. But still, the actual negotiation should be cheap. We want it to be fast. Really, really fast. And the last one, especially as a developer, it should be simple to use. It shouldn't be complex and hard. So at Google, we use protocol buffers. There is this standing joke at Google that there are only two kinds of jobs. You do proto-to-proto or you go proto [INAUDIBLE]. And that's all you do at Google as a software engineer. Essentially, it's true. So protocol buffers is a very flexible way to define data structures that are language-independent. They are pretty easy to use once you get used to the quirks. They have some. They're strongly type and they have a really nice compact binary representation. So when you send them across the wire, they're cheap to send. About a year ago, we released something called [INAUDIBLE] as Google. Remote procedure calls. So this is a public port of an internal system that we use, more or less, day-to-day for everything we do. It's defined using proto buffers. So you define your service as a set of operations on a service. It supports server to clients pushes through long-living bi-directional RPCs. So essentially, the client connects, and then you can start to send and receive across that connection without retrying. It's based on HTTP/2 or QUIC. For Duo, we used QUIC. And that's been sort of publicized. And the nice thing is it has support from more than 10 languages. So when you want to build something that's like iOS, Android, server-side, and something else, you don't need to write the same service API library multiple times. OK. Probably, the most critical things to get this working for real, you need to realize that mobile networks are flaky. I can't stress that enough. They are really flaky. Even though they seem to work, they don't. You have frequent disconnects. You have a lot of timeouts and packet loss. And you just need to deal with it. So in practice, that means that every single API that you publish has to be retry friendly. The fancy word for that is idempotent. So what it means is that you need to be able to do the same operation multiple times with the same arguments and you should only have the same outcome. In this example, we have a really crappy retry loop that tries to send Bob an offer. If it succeeds, we break out of the loop. And if it fails, we try again. If this isn't implemented in a retry friendly way, we're going to send Bob three copies of the offer if he is on or crappy network. Because the call may succeed, even though client-- the sender think it failed because of timeout. So the request makes it to the server. The server processes the request and sends it on, but the client timeouts before it actually realized that it worked. So design summary here then. We want to have a rendezvous system based on GCM and APNs. We want to define delivered as successfully handled by the application. We want to have a delivery model that say, yeah, we're going to do it at least once and in any order. So that means the client needs to handle with duplicate messages, out of order messages, all kind of crisis stuff. The lucky thing is that's kind of easy to do on the client. Doing that on the server, that's tricky and really, really hard. Sharding, we say, hey, let's do it by ear on the first registration. And protocol. We make sure to build everything as idempotent GRPCs. [INAUDIBLE] simple. No rocket science, as I said. So we'll remember this one, right? This is where we started, more or less. HTTP post and gets and polling. And we already know why this is bad. So a reliable GRPC solution looks more or less the same. Somewhat. You could even claim that it's a bit more complex. But if we work it through, you're going to realize that it's kind of nice and easy to reason about. We have a new server type, CS, Connection Server. The first thing you connect to. You have GCM, which is Google Cloud Messaging. Just to make it simple and not have APNs and all of this in the same picture. You still have Alice's app and Alice's library, Bob's library and Bob's app. OK. So how does this work? Alice call binds as soon as the application starts. We want to do it as early as possible to make sure that this is ready when we want to send. Setting up the connection is going to be the slowest thing. Bind, in this case, is a bidirectional GRPC. So it's going to stay up and it's the one we're going to use for all of the subsequent operations. As soon as the offer is created, we send it to Bob using the bidirectional bind channel. That's going to hit one connection server. Yes, any random one. We're going to tickle Bob because we know that Bob isn't around. I'm going to go into the details soon. GCM is going to forward a tickle to Bob's application, causing it to wake up. And actually, do something about it. Or actually, causing Bob to actually answer, really. On the other side, Bob is going to do the same bind to sort of set up the connection. As soon as he calls bind, the offer sent by Alice is going to be delivered. But we say that this is the offer and it's coming from Alice. As soon as Bob receives it, it calls create answers. Once that succeeds, he acknowledges the offer and sends it to Alice. We know that Alice is connected, so the first connection server just forwards the answer directly to Alice's connection server. We don't need to go through any other loops. Alice does the same thing. She receives the answers, calls set remote description, and then acknowledges the answer. So once again, we have the peer-to-peer connection up and running. That's pretty simple, not very hard. It's kind of easy to reason about, anyway. So let's dig into the actual details on how to implement this. This is a bit gritty, but this is the way it is. Again, correction server, DCM, another connection server. We have a local connection database. We have the global user database. And we have the local message database. Alice starts by issuing her bind. As soon as she binds to a connection server, we write her connection information directly into the connection table. That's essentially IP port and some number, indentifier, Alice's state on that connection server. When we bind, we also check the messages table. No one called Alice, so there is really nothing to fetch. Alice wants to send the first message again, so she called send, addresses it to Bob, and passes in the message. We need to verify that, a, Bob is a valid user. And b, where he is. Because we know that it's not unlikely that it's going to be in the same region. As soon as we know where Bob is, we can issue a select to actually get the connectivity details for Bob. But as we know, Bob isn't around, so that's going to return empty. We also insert that message directly into the database. This is actually the point where we tell Alice that this operation succeeded. And the reason we do it as early as possible is to minimize the risk of an error internally propagating out. As soon as we return to Alice, we actually continue. We call GCM and say, hey, tickle because we need to reach Bob. At some point, further into the future, GCM is going to wake up the application on Bob's side. And we sort of have started the first thing. OK. So application is running on Bob's side. He calls bind. Once again, we insert his connection info into the database. We check any messages pending. Yep, there is a message from Alice, so we're going to deliver that one. Bob handled it by calling create answer. Then, he calls ACK. On ACK, we update the state of the message in the database, but we don't actually delete it. It saves a tad bit of overhead, especially on our database. But it also requires you have some kind of cleanup system that goes around after the fact and deletes all of these old messages. So Bob wants to send his answer back to Alice. So he calls and once again, we check, does Alice even exist? That's pretty cacheable, but we do it anyway. We figure out, where is Alice connected? And we know that Alice is connected, so we're actually going to get the result back. And then, we insert the message into the database. We know that Alice was connected, so we got her IP port and ID. That means Bob's connection server can create the direct connection with Alice and do a direct call saying, hey, here's the message. But the trick is, there is always a tiny, tiny risk that Alice goes away and loses her connection before we manage to deliver it. So we always tickle GCM, even if we know that there is a message there. Even if we know that there is a connection there. There's going to be a race between the delivery of the message and the tickle from GCM. We just need to live with it. Client-side, that's kind of easy to do. Server-side, it would be a pain. Once again, Alice receives the message, calls ACK. She updates the state in the database and done. So how does this really work when you have a bigger system with all the pieces in place? So once again, connection servers. GCM, local state, local state, and global user database. As I said, this is drawn as sort of global only, but think of it as highly cacheable state you can move about. So again, Bob wants to send his response to Alice. He goes out. Oh, where is Alice's home? Then, goes to the connection database in exactly that region and checks, is Alice connected at the moment? And also, inserts the messages into the database. Alice is connected, so Bob's connection server connects directly to Alice's and forward the message. We do the tickle in parallel to make sure we don't lose the connection. And we have the normal race. Alice receives the answer, calls set remote description. That succeeds, so she says ACK. As soon as she says ACK, we update the entry in the database that say this is now delivered. That message will never be delivered again. There are, of course, races where you sort of have messages coming in, reconnects, and stuff like that. So that may actually be delivered again. And that's why we say it may or may not. The client need to deal with it. OK. So again, back to the overview of the reliable solution. It's kind of easy to reason about because you don't have polling loops. You don't have strange state. It has clean and simple operations. One thing that this doesn't show is that a lot of the operations in this slide and the previous slides are very easy to parallelize, so you can do them at the same time. And you can also do them a bit sort of optimistically. You just try. And if it fails, it's just the cost of a fail. There has been a lot of details in here. But stuff we haven't covered, which is equally hard or simple, depending on how you look at it, is authentication, identity, load balancing, idempotency-- how do you actually implement that to make it work-- message identity, cache eviction, especially global state. How do you make sure you evict the stuff that's in the cache so we can update it? And how do you handle poison messages? In the current system, if you get a poison message in, that actually costs Alice's-- the receiver's clients to crash. We can never get out of that state. It's going to keep crashing, keep crashing. Because as soon as we deliver, it dies. So that needs to be handled. So the summary of this tiny talk, you've got to remember the details because that's where the actual complexity of this is. Don't go home and build a custom rendezvous system. It just won't work. It has been tried once too many, so let's not repeat that. Make sure you have very clear message delivery semantics. It doesn't need to be exactly the ones I picked here, but make sure they are clear so the client teams know what to expect. [INAUDIBLE] the system based on actual user behavior. For Duo and Hangouts, we know how the users behave. But however your users are going to behave, I don't know. You need to figure it out, or assume something when you start. Use something really fast and really lightweight as client server protocol. Preferably, something that's easy to use as well, because it's going to make your developer a lot happier. And finally, and I can't stress this enough, make sure all APIs are retryable. Because if they're not, you're going to have very strange states flying around in your application. That's it. Questions? [APPLAUSE] [MUSIC PLAYING]

Info

Channel: Google Chrome Developers

Views: 7,318

Rating: undefined out of 5

Keywords: Chrome, Developers, Google, Web, Kranky Geek, WebRTC

Id: nPnWIuAlphc

Channel Id: undefined

Length: 30min 23sec (1823 seconds)

Published: Tue Jan 31 2017