Building A UDP Protocol For Cloud Gaming // Chris Dickson, Parsec [FirstMark's Code Driven]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so parsec so parsec well I'm Chris Dixon I'm the one of the found a co-founder and CTO of parsec well parsec lets you do is basically play video games if you don't have gaming hardware so you know if you use to play video games and you were in college or something when you were a kid and now all you have is a MacBook Pro that can't really play the best games in the world or you're playing them and you're limping in was like 20 frames per second or something like that parsec allows you to stream the game in real time with low latency from a cloud machine somewhere or it really has to use cases now it allows you to stream are there from like Amazon that has GPU instances or like actually all the cloud providers have now Asher has it and Google or if you have your own gaming PC you can actually stream from your own gaming PC remotely so you can think of it in that sense almost like remote desktop on steroids kind of it's like remote desktop but stricter requirements and tuned especially for gaming but at the end of the day it's a high performance video streamer and there are a lot of different components to this this has taken off of our website here and I mean I was thinking about what to talk about because there are a lot of little different things that I thought maybe you guys would find interesting but I think the most general purpose and one of the things I've done a lot of work on is actually our networking the networking is like a really critical component to all this because you know you're streaming like a zero buffered low latency video stream at like 10 megabits per second or higher and any little hiccup any loss any anything really kind of throws the thing for a loop so the networking is got to be really good so I'm gonna talk about today kind of how you would go about and this is you can think of this almost like our implementation has a little more bells and whistles on top of what I'm going to describe but I'm almost going to present like it's kind of naive implementation or like the the basic implementation of kind of rolling your own like reliable like UDP protocol so I'll kind of I don't know if you know little low-level maybe but hopefully you guys find it interesting you can apply it in your own if you find yourself in a situation like the one we're in here oh and yeah a little little color here if you're wondering where the name came from and there's no sound here that twelve parsecs from the castle run so that really embodies what we're trying to be as a company right there okay so first implementation we started out with TCP as anyone would it works right out of the box and you'll have to worry about loss it's good but I want to show you what happens in in a scenario like like parsec by the way it streams what happens when you use TCP and you apply a little bit of loss to the situation so this is a video we recorded that it's kind of self-explanatory actually okay so here's me in the practice range of overwatch on TCP good connection very good so I'm streaming from an Amazon server now to my computer in my office going from New York to Virginia here playing overwatch here's five percent packet loss so I'm using clumsy that's the windows tool to simulate it and it just kind of it just kind of blows up like I can't move I'm like lag it's doing its congestion control it's just like throwing me for a loop it's basically unplayable so that's what happens when you experience packet loss or some sort of congestion event when you try and use TCP for this type of streaming and this is the situation that we were trying to avoid so that caused us to look deeper and after evaluating some of the out-of-the-box solutions that we found some RTSP stuff i've looked at like google's quick we've looked at a handful of other sort of reliable UDP things I still felt we need more control and so I went for it all right so that's what follows so I don't want to like do like networking 101 here because this is stuff a lot of you probably already know but TCP versus you they seem like they're really similar because they often appear like in the same drop-down menu like for stuff it's like what would you like to allow TCP UDP both so it seems like they belong on the same level but nothing could be further from the truth TCP is like intensely complicated and extremely wrapped and you probably know this it handles congestion control this is over here it handles so like beneath beneath the surface the the you know the internet deals in packets which is like little messages that can can disappear or make it maybe TCP handles all of that sort of retransmission and reliability under the hood it turns it basically turns a network stream of packets which is a really intensely unreliable thing into like reading from a file so which is what you want like yet you like to deal with that you like to be able to just read like you're reading from a file or from some reliable source and TCP handles all that for you UDP does none of that it's essentially like a really thin abstraction over an IP packet adds a little bit of a header and that's it you send packets out almost like raw packets over the network sometimes they get there sometimes they don't and I added this little stat from the Linux kernel just to show you how complex TCP is versus UDP TCP is 32 files in the repo seven or twenty four kilobytes of code UDP six files 95 kilobytes so it's like seven times the code for the TCP stack so yeah moving on here so sum of TCP we wanted to keep in our implementation like the reliability the reordering is good the stream oriented nature of it is good we wanted to deal with data like that like the connect the way you make the connections the handshaking like connecting like that that's good where we where it really breaks down in our case is the congestion control so we deal with lossy compression we use h.264 to do all this encode super fast given some of the hardware support that we have but yeah we so we can we can tune the encoder to send a certain bitrate so we don't we can tell the encoder like if we can detect the congestion we can tell the encoder I don't want you to send ten megabits per second I want you to send five I want I could you can tweak it on the fly so we could really handle the congestion ourselves we didn't need TCP to do that for us so TCP was sort of gumming up the works in that sense over what we could do on our own and also one thing you can do that so TCP is like the workhorse of the Internet it's used for everything it's meant to work everywhere it's meant to like it prioritizes like the data getting there over everything else so it works on like your most resource constricted device it works like in space it works like everywhere so we don't need that because we have actually pretty narrow range of acceptable sort of Internet specs that parsec can run on I mean if you have over 100 milliseconds of network latency you might as well not even use it because it's gonna suck if you have like right if you have like less than one megabit of bandwidth you might as well not even use it right tcp needs to work under all those conditions so we can narrow the use case and and sort of code the thing specially for what we're doing now hole punch and I have to talk about this for a second because this is one of the coolest things I think I've ever discovered in my life not just in technology but this is like the original like bug that became a feature and this is sort of like a UDP only feature where like you since UDP is stateless you you can send it when you send out a UDP packet and you're behind a firewall or NAT I mean the NAT assumes that you might want to receive one back from wherever you're sending it to so it actually creates a little mapping there for you to receive data back from wherever you just sent that packet to that can be kind of misappropriated if you will to sort of if you have a third party that can gather information from two peers that want to connect to one another you can actually use that system to create a peer-to-peer connection with these little that's what's called hole punching with these little open windows that you can create with UDP the the NAT holds more state with TCP which makes it a hell of a lot harder to do and point of interest Amazon security groups actually you can hole punch through those so we can actually connect Amazon servers that have no ports open that are literally close to the world with hole punching so security groups not as secure as you thought so yeah I talked about a little bit of this you also get detailed metrics so like you know what your loss is you know what your round trip time is believe it or not that could be difficult to get when you're using TCP like Linux has some like non-standard API to get it but good luck trying to get that information on Windows or something some of the machines that our thing needs to run on you have just tight control over everything everything's in user space nothing is left at the kernel and the best part about it is you can choose what to drop and what not to drop a good example of this is absolute cursor data so we send cursor data over to control stuff on the remote machine every packet that goes over is an absolute coordinate of where the cursor should go if you miss one of those it's okay because the next one that comes tells you where the cursor should be anyway so in that case we can shut all of that off and say all right drop it and you can do that on a finer grained level and some of the more advanced for what we do with h.264 which is like maybe you accept the header reliably and maybe you allow packet loss in the in the payload and maybe you patch up that loss by telling the encoder to do something a little different and so it's great to have that control when you want to make a really high performance solution okay so I need rest of time your let's move this along so I'm going to go through some of the concepts here real quick it's like how you would go about thinking about implementing something like this if you really wanted to to do it if you're brave enough to try and do this so ours is all written in C I I guess you could probably write it in something else but I don't know I wouldn't advise it I'm I'm sure it would I mind I may be like maybe you could do it like an NGO or something like that Wireshark if you don't already have that installed you should install it because it's like the best I brought even they're the best program my favorite program you'd be shocked if you just leave that thing open like what you'll see like like color Wireshark is a packet like a deep packet analyzer and it measures every packet coming in and out of your machine and I think it actually can like no yeah everything that hits your interface you can see and it parses it and you can look at the binary and you just do everything like into you know decrypt sort of the SSL data you can look at it it's really good it's really really good for something like this when you have to debug it it's essential and then of course you need something that can test the lost least clumsy for Windows network conditioner xcode thing on mac and TC the reliable TC on linux so if you're making something like this and it's your own protocol you need to put some kind of header there we're gonna keep it super simple at first and just include in the header a sequence number and this is a concept from TCP again this is like the reference implementation this isn't like the bells and whistles implementation but you need like a sequence number to tell you in your stream where that packet belongs and of course you need something to identify the type like what type of packet is it is it an ACK is it a piece of data is it to keep alive you just need some flag there to describe that then you have your pail that's not really part of the header but you have your payload so now I'm going to talk about the concept of Windows like readwrite windows and TCP also has this concept so these are like the shared data structures that the data is read into and written into and sent out from this is what keeps track of the state of the connection so like when things are coming in where they go and you know how they're organized and I think I should just go on to this next thing to kind of give you the sense of what this might look like forgive me if this is a little deep but you know I'm trying to I want you guys to learn something if you don't already know it but so the top thing here is the is the right window this is what a right window might look like when you're sending packets and the right window has three different states so you've got like you think of like an array or like a slots in your sequence and the right window you've got kind of your sent but uh nacked packets at the base there you've got kind of your write your written but unsent packets here in green and then you've got your head so this is the area of the window that hasn't been touched yet and so the base is over here and this thing's waiting for an AK it's not gonna overwrite this or it's not gonna move any further it's gonna keep checking this until it gets an AK and we'll move on to that when I get into retransmission but the base here I mean this is the read window this is a it's only has two states as opposed to the the right window but you you basically you're at the base here this is what's yet to have been read from the application so if you're in your application and you're reading this is the next spot it's gonna start popping off data from the window and you've got your head this is where it's gonna be writing to so it's sort of like and and this is a concurrency thing you'll see in a second but this is where you'd write this is where their data is getting attached to the end of this Q here basic just the Q it's like a circular buffer like a ring buffer an AK just like an AK in TCP every time you get something you have to send back an acknowledgement to say I've gotten this and I don't need to retransmit it or you know forget about it now keep going advanced the sequence number this is also how you calculate the round-trip time so you send you get an ACK you know what your latency is you know what your round-trip time is keep a lives you have to have something like this because otherwise you don't really have a concept of a state like a connection state you need something that keeps this the connection alive and doesn't make it timeout essentially so these are like dummy packets almost saying don't closed on me so retransmissions this is where like most of the headache is if you ever wanted to try and implement something like this retransmissions so this means I've sent something and I haven't gotten an acknowledgement in a certain amount of time or something looks funny in the way I'm getting the acknowledgments I'm just gonna go to the picture here so you kind of see how how we think about handling it so this is the right window here and in red these are the sent but uh nacked packets and the blue is empty and so we do something called a fast retransmit where you don't actually you keep sending and you don't you you you don't wait you don't do a round-trip on every set and wait for a hack you just keep going you just keep sending and you you asynchronously wait for the axe now out of order packets which you've heard of like on the Internet it usually happens over Wi-Fi but usually flipped so it's not like drastically out of order it's not like you'll get a packet that's like like ten elements out of order it's usually just one element out of order and so if we get this act but we didn't get this act well that's kind of suspicious because we sent this first but we got this act back first so something looks like it might be wrong but we're not gonna do anything yet because we're gonna wait for a second just in case these packets these acts are flipped so we won't send a retransmit here but in this case we would so we've got an ack ack ack ack and then this is sent but no acting then we've got two acts after it in this case we would send a fast retransmit because this means okay this is probably dropped right here the receiving end probably didn't get this and so we only do like a handful of these or even just one of them before we stop and then move to a slow retransmit because you can really fill up the pipe this way if you're not careful but in a high performance solution this is critical because you can't wait for round-trips to resend packets in a system like this you have to just like take action immediately so like I'm gonna talk this isn't like a thing on concurrency but I'm gonna just touch on it briefly so like in a system like this there has to be concurrency really if you're gonna provide like a decent interface to your application because I mean think you can't be like you can't have your application be sending keepa lives like that's not something you'd ever want to deal with you need threads in the background that are handling that and unfortunately with concurrency comes a whole slew of other problems which is looking make this really difficult and the thing I've learned if you ever find yourself working on something like this is when it comes to concurrency start with the dumb synchronization first like use like some global mutex or lock that locks everything and start there right and then make sure your application logic make sure your implementation is good and then work on like fine-tuning you know like the the locking and getting everything like super parallelized because I've made the mistake of trying to go too deep too early in in the synchronization and you're dealing with two things at once at that point so you know just a little a little my little two cents on on concurrency I mean because most of the time these threads are sleeping like most of the time these this isn't like the type of thing we need to squeeze every ounce out of the CPU they're mostly sleeping waiting for messages so it's actually okay to kind of be done with the mutexes I'm gonna skip this for now this is a little bit more on how we do congest we do we use kind of like a cubic function to kind of feed to the encoder so this isn't like strictly part of the protocol but we kind of use that's what kind of our start sort of a slow start looks like and if we reach congestion event that's kind of how it scales down but there's not too much to go into I don't really want to go into that too much right now and then of course at the end of the day same scenario oh by the way I named our protocol bud I had to give it a name and I chose bud it stands for better user Datagram this little pompous actually but whatever you know and so this is the same scenario really and so here's our protocol with zero percent or zero packet loss just same same connection in the office here all headshots and then same scenario as before and turning on the five percent loss and I just keeps running smoothly so this is what you can squeeze out if you take control over the scenario and kind of implement your own logic so it's hard to pick up on this video it is a little more jittery than before or five percent packet loss is like this thing holds up under much higher but it's not common to get solid five percent packet loss usually you back off a little bit in the packet loss goes away but this is just an example to show you that the protocol holds up much better when you're in sort of a congestion sort of scenario and my CEO put this slide in so that's it thank you well they have a lot easier because they have buffers so like we have no buffers at all I mean everything is shipped over the network and and decoded and rendered as fast as humanly possible and so like most of the work we've done has been around like just making like getting it to be smooth under that kind of because a buffer is what you want that makes it smooth and you can time things really well it's easier it's much easier to do congestion control because you can just take a look at your buffer and if things are like moving around you're like oh stop sending so fast or send faster but when you have no buffer you to make like decisions like instantaneously yeah of all time at the moment all I do is work now so I don't know I like I like overwatch I mean I'm not like a avid player you know I shouldn't say I mean I'm overwatch question me like I don't know I don't play as much anymore sometimes I'll sometimes I pop into WoW is still a little bit yeah judge me but you know when I it's an old-school game that you know I was addicted to as a kid so sometimes the order just come back and I'll spend a weekend doing that but you know so what tools you used to test the protocol I mean like do you have some scenarios to test how it works over the network we have some unit tests that I mean I personally my acid test is just like cranking up the loss a ton and seeing if it holds together not very scientific but usually it flushes out bugs pretty quickly if I make a change to it we also some scripts that kind of run it through the wringer that kind of like send data and strange ways with like you know different packet loss and dropping scenarios to kind of make sure that everything looks right usually if you've got a race condition in something like this it comes out pretty quickly like you it it's just like if you just start sending it bad data like if it's not rock solid and you and you start throwing eternal loss at it with sort of garbage data you're gonna crash something very fast so it's kind of okay thank you how does this hold up like globally in in areas where maybe connection isn't strong or this the lines of communication are longer it actually holds up pretty well I would actually use it on the plane I went to I went to LA recently and I connected to my home machine I used parsec as like a remote desktop tool as well and I used parsec from the plane using this protocol and it worked it worked well it's a bit wasteful in that scenario it I have a feeling certain networking environments might might throttle it a little bit more because it tends to reach transmit pretty aggressively but it held up fine on the plane with whatever internet this satellite internet they were used I've the latency was terrible but you know it held together which I was impressed at so I need to test it from space but I don't have I don't know how much that costs but I'm pretty far away from being able to afford it whatever it is all right not the only time I glossed over that in the presentation we have sort of a fail-safe slow retransmit system as well so like it especially happens when you send like just one packet of data and then maybe that's all that one of the the N sends if you've dropped that then your fast retransmit won't work because there's no next act to like look back and say oh we missed one so you need sort of a fail-safe it's like a it's based off the round-trip time it's a multiple of the round-trip time and if you haven't seen the thing get act and some like you know a second let's call it or 500 milliseconds or something you'll just send the @ you'll send the retransmit anyway right well okay and I glossed over that a little bit too but there was in in when the acts are sent they're sort of redundant acts so the reading end is always updating the writing end as to where it's at in the stream so like if there's some loss there and maybe there's something the pipeline some hack doesn't make it if the receiving end actually did get that packet and it moved on in the stream it'll update the sender to say I've moved past that so don't stop don't even try and like you're done don't don't start keep calculating and retransmitting so that's kind of that's kind of how we handle that I try and use close to 1500 as close to that as I can I mean my thought is that the larger the if you go over that you're screwed but because it just starts dry you know that's just the standard packet size but I don't know it seems to me that processing more packets especially something we're doing reliable reliable mode processing more packets more overhead I'd rather fit more in one Hackett that's my I haven't tested that a lot actually but I just make it as big as pie I think our pack sizes like fourteen something fourteen fifty ish with wheels also use DTLS which is the the UDP sort of TLS and that adds like another 50 bytes or something to that so it's like 1450 at the end of the day is like how big our packets are usually when they're full you know the mouse cursor stuffs a lot smaller [Applause]
Info
Channel: Code Driven NYC
Views: 4,925
Rating: undefined out of 5
Keywords: software, development, engineering, code, programming, startups, product, design, gaming, games, cloud, venture capital, parsec, chris dickson
Id: VDR8ws0eIbI
Channel Id: undefined
Length: 27min 4sec (1624 seconds)
Published: Fri Apr 14 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.