Surge 2016 - Hooman Beheshti - HTTP/2: what no one is telling you

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi my I'm not between you and beer am i there's like one more after this I'm between you and beer I'm so sorry okay I have a lot to tell you so I'm gonna get started as you guys are coming in my name is human I'm the VP of technology at fastly the title of this talk was quite accurate when we first came up with the abstract but since then a lot of the information that I'm gonna give you today is been surfacing from here and there so I actually had to change the name of the talk slightly to make sure that it covers I cover my ass despite this if you're anything like me probably most of what you've heard about HTTP resembles this a lot of promise a lot of potential a lot of great things about how it's gonna solve all our problems as a matter of fact for me felt like clickbait after a while so this my ID idea behind this talk was to sort of take a step back and take a slightly more objective view of the protocol take some practical considerations into scope I have a bunch of data I want to share with you that we're gonna sort of go through together and the idea is ultimately to have a better understanding of the good and the bad my assumption is that you're generally familiar with web performance topics and familiar enough with HTTP to to been curious enough to come into this so I'm not going to cover a lot of like super basic stuff but I'm gonna cover HTTP 2 and some core concepts before I get into some details about its features HTTP 2 was ratified as RFC 75 40s I think right around a year ago if you don't already know it's a binary protocol a binary protocol means that everything is behind a hurry obviously but everything is in a well-defined pre deterministic place so as you're developing things are really really easy to find this is a good thing because you don't have issues like text parsing and dealing with like line termination and things that you have to do with h-2b one before the bad thing is so are gone are the days of Tel letting to port 80 and typing something in and getting coming back so troubleshooting got a lot more difficult not because of this but also because of the security side which we'll talk about a second so you need tools not to do any sort of debugging it's got things got a little bit more difficult on that side HTTP two runs over a single connection it kind of looks like this the client starts a connection a TCP connection to the server everything for an HTTP two connection and all HTTP two communication between a client server will happen over one single long lasting TCP connection the idea here is that this will give you better congestion management because you don't have multiple TCP connections vying for the same network resources everything will also happen over TLS this is not mandated by the protocol but all the browser's basically said we're only supporting a TLS version of the protocol so using a LPN which is a TLS extension there's a protocol negotiation during the SSL hence a TLS handshake and then the two sides start speaking h2 and you can reuse a connection so to maximize the number of requests that you shove over a single connection you can actually reuse connections and the rule is that if two different host names is off to the same IP address and have the same certificate you can share and coalesce those connections those requests over a single connection over that connection are these things called streams and you can have many streams simultaneously sort of going back and forth between peers over a single connections streams are essentially virtual channels for communication that roughly translate to a single request/response transaction request response exchange either side can initiate a stream and every stream has a stream ID if it's an odd number stream ID it was a client initiated stream if it was an even number she mighty it was a server initiated stream and streams zero is reserved for the connection itself for management reasons the only rules are they have to increment you can't reuse with the with the bit space that's about a billion streams per peer so hopefully you'll be able to do that for the lifetime of the connection the smallest unit of communication in h2 is this thing called a frame every exchange happens over frames and frames sort of flow back and forth between peers and they all carry a stream ID and that associations them with the stream that the communication is supposed to be over the best way to explain frames is kind of like this imagine that this was your h1 communication between a client and a server you sent a bunch of request headers the server sent you a bunch of response headers with a bunch of response body in h2 you send a headers frame with all your request headers and in response the server sends a headers frame with all the response headers and a bunch of data frames with all the actual data the payload there's a whole bunch of different frames defined by the protocol I'm not gonna talk about all of these I've kind of talked about data and headers frames I kind of want to point out the settings frame it's this like management frame that the two sides exchanged to sort of communicate different capabilities with each other it's things like how many concurrent streams you can have appears expecting to see it on a single connection what your maximum frame size can be how big your H PAC table is whether you support features like push that's what the settings frame is for so what are the protocol flow looks like well with h1 weight a single connection and we would send a request and the server would send a response that's how h1 worked with h2 we have a single connection instead of sending a request we send a headers frame with all our request headers it's got a stream ID and the server responds with a headers frame with the response headers on a bunch of data frames that's the payload everything has a stream ID and that's how the two sides know what the communication is now with h1 we had this one connection and nothing else could happen over that connection for the life of the lifetime of that request response exchange right this was the big problem because that connection was blocked while this outstanding request was being processed we experienced a phenomenon called phenomenon called head aligned blocking so nothing else could happen over that HTTP 1 connection so we had no concurrency we had no multiplexing we had no interleaving of responses to suit so to get concurrency what we did is we opened a bunch of connections at once browsers open up to 6 connections between a client and every single host and we even shard it to get more than that so we can have a lot of concurrency happening at the same time because our web pages have a lot of assets we have a lot of fetching to do so in h2 we don't need to do that anymore what happens is frames flow freely between PA years you can have data frames you can have header frames you can send multiple header frames in a row and session a pipelining for requests data frames can come back interleaved you can have data frames one stream come after data frames from another stream and you can interleave them essentially somebody thought this would be best represented as candy so this is a delicious analogy that sort of explains this concept if you ever look the waterfalls which is I hope true for all of you this is an H to be one waterfall that looks like this when you go at H to see there's a lot of things happening at the same time so concurrency is what h2 is giving us and all those things are happening at the same time if you're like me and love looking at Wireshark captures this is what it looks like in Wireshark notice that frame all the way at the bottom but there's a whole bunch of header frame sorry TCP packet at the bottom it's got a whole bunch of header frames those header frames are essentially pipelined response headers all coming at once from the server before the server sends me any of the payload for those particular streams pretty much every browser now supports h2 it's on by default anymore and all the servers are kind of supporting to patch and nginx both have modules if you've never heard of this server called h2o I highly highly recommend you check it out we use it a fastly it's a fantastic server it supports both h1 and h2 supports all the h2 features it's a great great server there's a whole list of all the other servers on Wikipedia and if you're using a CDN which again is I hope it's true for every single one of you here because CDN czar good talk to your CDN because they're the ones that need to be doing that h2 termination for you ok that was a crash course in h2 probably the most important or talked about feature of h2 is performance h2 is supposed to make everything faster for us and our initially we sort of believed that it was going to turn everything faster sort of by default I got to do is turn it on can't don't need to change anything everything's gonna be faster I'm plagued with the disease of cynicism so I had to see this for myself so what I did is I started with what should be the perfect page for HTTP 2 and that's a page that basically has very zero very little rendering on the browser side no scripts no CSS just a hundred 10k images coming down the pipe that's it very very simple page this is what the waterfall looked like before and that's what the waterfall looks like after and if you ever decide to do this as an experiment for yourself I highly recommend that you use a different image than the one that I picked because these look amazingly cute when you start the process but I can't tell you the horrors of the nightmares these induce about 10 20 to 30 times after you've done this analysis so please pick a different image pick a transparent one look at a white screen so to do this test i used webpagetest how many people are familiar with web page test thank you great if you're not please from here something familiarize yourself with it it's basically a synthetic testing tools that emulates real-world conditions are you sort of the default profile for web page test which is five Meg's of download bandwidth one mega upload bandwidth I use 40 milliseconds of latency cuz roughly in the US 40 milliseconds is the median across a whole bunch of CD ends I used Chrome and the idea was to test h1 versus h2 not if again if you've used webpagetest before you can do like three runs five runs ten runs if you have a private version you can do like 20 and 30 runs to sort of see patterns and the data there you'll see variants because tests give you variants I didn't want to do three five ten or eleven runs I wanted to do two hundred and seventy so I did two hundred and seventy runs each of h1 versus H two and this is a scatter plot of document complete time for this page can you see this is it clear clear enough to see blue versus orange okay so blue is h2 orange is h1 and okay seems like it's a little bit fat I kind of was hoping for a bigger separation because of all the talk but okay I think it's fair to say that in this graph h2 is slightly faster than h1 pattern and the pattern shows this and this is document complete time so metrics matter and we'll talk about that in a second now you've probably seen something to this effect with the same conclusions of like h2 seems to be faster sometimes a lot faster sometimes not so much from various sources what I had never seen before is what would happen to this not only with inflicted bandwidth and latency but also a packet loss packet loss is a thing that happens in the world and what's the effect of that on this so let's go back to our weapon this is webpage this if you haven't seen it before what page test gives you the ability to inject packet loss that's great I was using webpagetest so let's inject some packet loss so I started injecting packet loss this is our scatterplot to start with let's put that in the corner and let's start adding packet loss to this we don't change anything else except packet loss so 270 runs h1 vs. h2 with packet loss this is with half a percent of packet loss 1% packet loss and 2% packet loss the story changes right and that's really interesting ok ok what if this is a chrome thing let's try Firefox pretty much the same thing right ok great a lot of people are saying that h2 is supposed to say it's supposed to help mobile connections and slow connections and higher percentiles let's do that here is a slow 3G connection with poor performance low bandwidth high latency let's see how this guy reacts pretty much the same so again h1 does better when there's packet loss involved you see that little bump on the Left says think of this other way see that bump on the left side where h2 sort of behave the same as h1 unbeknownst to me somebody turn off our h2 capabilities during this test so those are actually h1 sample points that look like h2 because of the way I was testing I thought about taking it out of this but I actually kind of liked it that they're in here they show the pattern sort of well I'm gonna switch gears instead of showing you scatter plots I'm gonna show you CDF graphs if case you're not familiar with CDF graphs it essentially graphs on the y-axis percentile versus the value on the x-axis so what this means is the 50th percentile of the blue line on the upper right upper left graph is roughly around 2000 milliseconds the 50th percentile of the orange graf is roughly around 20 to 50 milliseconds and this graphs all the percentiles for each of those runs blues H 2 oranges H 1 get used to these colors I'm going to show you a lot of these and look at the pattern as packet losses at it and as you sort of visualize as you look at this if you're to the left you're doing better than if you're right the ideal scenario is a straight line up and down which means there's your variance but there's always going to be some level of variance this is Firefox let's add chrome into the picture there is a separation between Chrome and Firefox I'm not going to get into that I just want to sort of point it out and you'll see this as I show you more of these get used to these colors because I want to flash a whole bunch of these in front of you blue and red are h2 orange and green are h1 for Firefox and Chrome so again blue and red versus orange and green to make things a little bit easier to to digest that made a score card with the extremes so 0% packet loss versus 1% pack sorry 2% packet loss obviously we see that with document complete time h2 is not doing so good with high packet loss now I'm a good web performance citizen and I know that doc complete isn't the only metric I should be looking at as a great talk from Tommy Everett's and Pat meanin that said Dom content loaded seems to be a very good indicator of of e-commerce conversion so I want to make sure I look at that and I also like speed index if you've ever use web page that's speeding so index is a great metric it's an aggregate metric that it calculates for you based on visual completeness of the page so I added those two in the mix and here's how they sort of fare with zero percent versus two percent now I'm not stoked about the h1 winning with a zero percent packet loss that's a little disheartening to start with but clearly we see that packet loss has an effect on the performance of h2 on the right-hand side of this chart now certainly with certain metrics h2 holds up and what are the lessons that we're going to learn here is that this a lot of this depends on the metric that you choose well why is this happening well it's happening because this is actually an incorrect and misleading analogy the actual analogy should look more like this right what we're doing is we're replacing six concurrent connections with a single concurrent connection sure we can do a lot of things over that one connection but we had six concurrent channels before look what happens to a single TCP connection over the life of lifetime of its connection that's listen this connection this this is graphing on the y-axis the congestion window versus time the congestion window is in like there's four year courses on this what I'm going to give it to you in ten seconds is essentially the amount of data server can send safely without any loss on the network or what it thinks is the right amount of data to send what normally happens over the lifetime of a TCP connection is the server ramps up ramps up ramps up until there is natural loss then it slows down with the ultimate goal of sending enough packets onto the network without incurring loss that's what it's trying to do so you can see this like you know convergence and this this particular server is ramping up and it figures out the right Seawind over this connection this is with zero percent packet loss I inject one percent packet loss into that same connection and that's what the graph looks like so imagine that an h2 connection is subjected to this sort of behavior it doesn't matter if you're interleaving if the server says I can't send a packet onto the network all the frames that are sitting there are stuck behind that decision so what's essentially happen is we've moved head-of-line blocking from HTTP to TCP this is not a new concept they the authors of the protocol knew that this was a byproduct of the design and decided that this was a this was a the net-net was going to be positive and sometimes it is but what I'm pointing out here I think is situations where it's not imagine if you have six concurrent connections running at the same time and they're subjected to the same 1% packet loss they're all incurring and suffering from it in their own pace and their own context and if you sum up at any point in this graph the total steal end of those connections across all six the sum is always going to be bigger than any one of those what's happening is as multiple connections are incurring packet loss some are suffering and some for someone suffering differently or differ time and that's why I think we're seeing in certain situations certainly with packet loss h1 fare better now okay my test was stimulated so that's not good but that's what we have or that's what I had and it was a totally fake page so I wanted to do real pages so I took eight real pages so eight sort of home pages from eight real sites and I tested them with sixteen different bandwidth and latency combinations so I'll search a different broadband and all sorts of different Nod broadband like mobile type connections each of them incurring those four different packet loss profiles I tested Firefox and Chrome I only tested the TLS version of HTTP one remember this everything I'm going to show you is the SSL version the TLS version of these to be one site there is an e there's an in there late and inherent performance loss tiny bit when you go from HTTP to HTTPS and we're not even considering that we're just going to consider the SSL the TLS version of the site to h2 and I collected a shitload of metrics because that seemed like the right thing to do and I ran each of those for 300 to 400 times that adds up to around 1.2 million web page test runs in a big database this is a fork it looks a lot like the one I wanted to shove into my eye sockets as I started looking through 1.2 million data points it was very difficult because I was really really hoping the patterns would emerge and I can come here and give you these amazing lessons to take home I can't do that spoiler alert but I did decide that what I'm going to do is I need to sort of limit the data set to something that's analyzable so I divided the pages into three different types those there's a few of the sites so that few the pages where 75% or more of the requests of the assets were moving from h1 to h2 there was the second bucket that was about half the request and there was a third bucket which was slightly 25% or less I look I could have two profiles not totally arbitrarily I sort of solicited some advice so it's a general broadband profile with five Meg down and one Meg up and a general 3G profile a slow 3G profile with 780 down and 330 up with high latency and I looked at those three metrics that we talked about earlier so I'm gonna go these three sites with you and sort of show you those cdf charts for each of them that's a lot of slides with cdf charts I'm kind of gonna race through them but I have scorecards that will outline it all for you and show you all the beautiful results first was a page on the fastly website there was a page that we show off all our customers that page has about 130 hundred and thirty five resources that's about three Meg's and about 75% of those were on the root domain and I moved that's what I was moving to h2 it roughly accounted for you know 2.5 Meg's of all that data this is what the waterfall looks like before and after I have 6 CDF charts I'm going to show you I'm going to show them to you quickly here is the dot complete one just train your eyes on sort of the and I'm sorry if you can't see the blue thing but you think maybe you can see it train your eyes on blue and red versus orange and green so that's h2 on Firefox and Chrome versus h1 on firefox and chrome so blues h2 read as h2 orange is h1 green is h1 so this is not complete time with broadband Dom content loaded with broadband speed index with broadband dock complete with 3G this is Dom content loaded with 3G actually in this case if you look at the bottom right h2 sorta holds up ok in this case and speed index with 3G okay that's too many lines let's sum it up here's the scorecard it kind of looks similar to what we had with that fake site again not cool that h1 is coming on top with 0% packet loss that probably blows away some of our expectations but certainly we can see that packet loss has an effect I can't tell you whose site 2 is because I tested it without them knowing and it'd be unfair for me to tell you they are I can tell you it's a travel site about a hundred requests to the onload time to the onload timer 1.7 Meg's or so and about half of those were on the root domain or one of the subordinate domains that I could move over to h2 they accounted however for about 75% of the payload this is what the waterfall looks like before and after you can see sort of that top area where the where the concurrency occurs and requests move over to h2 again let's go through these dock complete I with broadband again blue and red are h2 orange and green our h1 dock complete with broadband Dom content loaded with broadband speed of thanks with broadband it's a little weird that all the lines are really really close to each other it kind of was hoping for more separation but this is what happens here's 3G this is the complete Dom content loaded and finally speed index and again in this case you can see h2 that red line which is chrome with h2 sort of holds up across all four profile so this is a case where h2 sort of holds our fine probably an artifact of the way the resources are ordered or being rendered by the browser this is what it looks like on our scorecard h2 fares relatively well with zero percent packet loss but again I don't like those Reds there and certainly a lot of red with two percent I think the point is kind of becoming clearer but I'm gonna keep driving it so I'm gonna do a third site and with the third side this is a media site media site a lot of requests and a lot of third-party content so only about 25% of those requests are moving over from h1 to h2 this is what the waterfall looks like it's huge and there's so much going on here that you can't even tell which ones have moved over but trust me when I tell you about 25% of the requests moved over to h2 let's look at our really really friendly see the F charge here's dock complete with broadband and in this case you can kind of see again with the the Chrome and Firefox h2 sort of hold up okay with 2% packet loss that's kind of cool except we moved a lot less resources over dock complete time this is Dom content loaded with broadband speed index with broadband let's go to 3G dock complete this is the noodle graph with Dom content loaded speed index and our scorecard in fact let's put all of them on a scorecard so here's all 8 sites the first of each of the buckets like 1a 2a and 3a are the ones I just went through I didn't go through the CDF charts for the other ones because God have mercy on all of us but that's sort of the the way everything's scored out so here let's look at just broadband let's look at just 3G do and here's everything together so I'm just gonna leave this on for a second so we can sort of absorb it in not stoked on all the red things on the left side with the 0% packet loss so I was kind of hoping to have sort of all greens there certainly there's an effect on on on performance when it comes when we inject packet loss of the network and 2% is you know in this case the extreme that I chose I didn't go any further than that and definitely a lot of inconsistencies h2 holds up in some cases and doesn't hold up in others what does it all mean I don't know it just seems like packet loss is a thing and certainly it seems like metrics that are later in the page are being affected more than metrics that are early in the page but I want to be careful with that declaration because in the lifetime of a TCP connection a page is a really short time what would be interesting is what this looks like 7 8 10 pages into a flow like into a site as you navigate a site and I don't have that data unfortunately I hope to have it at some point we've seen lots of expect exceptions right we saw places where h2 held up with packet loss we saw paces where h1 h1 was winning with no packet loss and we definitely saw the Firefox and Chrome didn't behave the same that's an interesting issue I have a theory of many that I if I have time I'll share with you guys a little bit later and it's probably safe to say that when you look at data like that there's more questions than answers unfortunately I really want to stand up here and give you these magnificent great rules of thumb but I can't there's no thumbs to be ruled now the natural next question that you probably are asking is great packet loss seems to be a problem what's it look like in the real world well packet loss is actually difficult to measure but we try to add fastly and we sample a whole bunch of requests the flora through our network let me narrow it through this graph for you this is a bunch of sampled requests we keep track on a request level in the United States so this is about 8 million requests all these requests were 100 packets or more so you know relative like that what is that 14k or more roughly what you're seeing is up to 60 milliseconds of round-trip time between the client and our servers that's about 70% I'll of the US so in our network about 70% of the US is experiencing 60 milliseconds or less what you're seeing here is the orange band is the number the percentage of requests that experienced 0% packet loss the blue band is the percentage of those requests that experience zero to 1.5% packet loss and the orange band is the percentage requests that experienced more than 1.5% packet loss so roughly 20% of these requests and this is a sample of a hundred packets or more requests with hundreds or more about 8 million requests that experienced the 20% or so experienced some level of packet loss that is an interesting fact it could be a lot worse it's good that it's only 20% but it is happening sort of out in the wild it's difficult thing to measure but this is sort of a good proxy for it we find I'm not the first person to talk about packet loss and its effect on h2 or the fact that h2 doesn't necessarily perform best I have a whole bunch of reading and homework for you guys there's a bunch of papers that sort of talk about things like this everything from how packet loss affects h2 to cases where h2 didn't perform better now what first the word of caution I'm not gonna stand up here like I've been saying and draw these magnificent giant conclusions other than the fact that packet loss seems to affect performance and h2 isn't always as fast as we had sort of hoped it would be I think that's as far as I'm willing to go and I was really tempted to go further and really dig into those 1.2 million rows and find out more patterns but I stopped for a reason this was all simulated right everything that I did was simulated there were real pages but it was similar conditions with simulated like chrome but on a you know simulated sort of connection packet loss in the real world is slightly different packet loss happens when buffers bloat in routers and overflow packet loss happens because you turn on your microwave at home and your Wi-Fi router decides to drop a packet things and what I was injecting was uniform packet loss so packet loss is different in the world conditions are different in the world this was all simulated and users aren't all broadband users or all 3G users your users are a mixture of all these things so there is a lot of variants in your worlds and none of this now nothing beats real world data we need real world data to draw some of those conclusions which means your mileage will definitely vary so the biggest lesson out of all of this is don't listen to anything anybody says anybody don't listen to anyone including me do this for yourself if I was gonna leave you with one thing is if you're interested in this you have to do it for yourself to understand the benefits I'm gonna I'm gonna preach more about this later but this is a I think this is the biggest lesson here Patrick comment from the Financial Times did it and he released this this is a graph of his dog complete time for from real users document complete from ROM and this is the effect of h2 versus h1 and he found that when the RTT between a user and his servers was higher h2 was helping more there's two interesting things here one it's unclear what percentage of his users fall into each of these bands so it's difficult to tell and second is everything it seems like h1 is h2 is not creating a lot of performance value on the left side of 100 milliseconds and all CD ends in the world try to live on the left side of the 100 milliseconds and everybody wants CDNs to do h2 so this is kind of a funny thing to me but it's out there it's it is what it is and we're doing h to all the CD ends are gonna do h2 we're all gonna end up doing h2 this answers the first question though it's probably fair to say that the higher percentiles of Patrick's users were falling into the higher RTT times and that's the those are the users that we're getting benefit and I employ you to do this stuff for yourselves and figure this out this sort of lines up with what I'm hearing from the pundits if you will that we should expect more performance benefit at the 95th and higher percentiles and we shouldn't expect a lot at the median but again some of the stuff that I did doesn't necessarily match that either so again it's kind of vary because there's so many variables here really depends on your users I had two conversations within ten minutes two nights ago at an event of two different people with big sites an e-commerce site and a media site that did this stuff themselves and found HT to have zero to no benefit for them marginal was the word that was used that doesn't mean they're not gonna use it because there may be other benefits they find but we just need some caution when it comes to performance to encourage you to do this on your own instead of sitting here and going through 1.2 million data points with you I made you a gift so here's a tool and let me just preface this with this I don't know how to code I got all so I wrote this really really hacky Python code that I gave to Marty Terrell who I work with I'm lucky enough to work with his patient enough to clean it up for me and make it respectable this is a very simple tool you basically put in your connection profile you put in your site it uses web page test whether it's your private web page test instance or the public one does a test and spits out a PDF with scatter plots of all the core metrics now this is again simulated it's a little you know it's a glimpse into what reality may look like but I really really implore you to do this for yourself for your users and if nothing else you'd have to compare h1 verses h2 just figure out what the the benefits are for you this is probably a good time to talk about quick I'm not gonna dig too much into quick but quick is the new protocol that Google is sort of spearheading it's now in the standard track quic is essentially a new transport for HTTP over UDP and the biggest motivation for quick was that congestion control which is the thing that controls how TCP server responds to loss events on the network was moved in quick from the kernel which is very difficult to iterate at to the user space so one of the things that quick does is it lets you be very agile and very iterative iterative fee for your algorithms so this was allowed Google to do a lot of iterations it's now in the standard track there's a working group and everything and it's gonna change things up a little bit because it for the first time sort of blends application layer and transport into one thing so it's good and bad it's probably gonna be better for performance but it's gonna make things a lot more complicated okay let's talk about server put when I move on performance put her asleep let's move on to server push probably the second feature that gets talked about the most we were we have always been very very optimistic about server push and the promise of pushing things to the browser the basic notion is that you the server has the ability to push data to a browser before the browser requests it or even knows it needs it so the idea is to get things into the browser cache only servers can push and it's a hop by hop property meaning if you have a middle box the way that metal box pushes to a client can be different than the way that middle box gets pushed to from a server the ability to do it is negotiated in a settings frame romana I talked about settings frames earlier here's what it looks like it's on my default so the only time you actually advertise it is when you don't support it so this is an example of a client that doesn't support push push happens with the use of a special frame called a push promise and this is what it kind of the flow looks like you as a client send a request it's a headers frame that comes in for let's say the index.html before sending you the index.html the server sends you a push promise frame which is that big orange thing up there the push promise frame is associated with the stream that you made the request on it has a promise stream ID which is going to be an even ID because it's a server initiated stream and it has the request headers you would request if you were going to make that request on your own so it's the would be request headers that the browser would send if it was to make to naturally make that request the only rule is the push promise has to show up before the thing that references what's being pushed so if I'm pushed proper if I'm pushing CSS CSS one dot CSS I have to send the push promise frame before the HTML or the headers that reference that CSS I don't need to send the CSS I just need to send the push promised to sort of let the browser know that this is coming that's the idea in this case we send the push promise that we send the headers frame with data from stream ID 1 the HTML and then those the headers and data from steam ID 2 which is the push stream and again because as h2 we can interleave those with with the data from stream 1 this is kind of what it looks like in dev tools those four-week first requests that you see those are pushed responses pushed assets from the browser from the server to the browser you can see that the initiator over there says that it's push and chrome dev tools actually shows us this little thing - that says it was a push and how long it took to receive that push so that's what how push works there's two big questions with push outstanding the first is what the hell do we push and that is actually actually outside the scope of the protocol the protocol intended to just create a mechanism for you to push it's kind of up to you to figure out what to push that doesn't mean it's not a question it's actually a big one and second it may be a bummer but it turns out that push and browser caches don't work well together in fact the push cache and the browser cache are two different places and they don't know about each other so what happens is the the protocol allowed a mechanism that lets you say I'm about to push you something and a lot it has a mechanism through it the use of a research stream frame for you as the client to tell me you don't want it because you already have it but I have yet to be able to generate a recess stream frame from a browser the browser's just don't know that they have it in the cache and even if they did it would be too late right if I'm pushing you something and you're gonna tell me you don't want it and I've started pushing that stuff down to you takes a whole RTT for me to get that you know go you know off message from you so it's too late already I've shoved a bunch of down the pipe so let's see what this looks like here is a no push scenario first few cold cash here's a page with ten images and for CSS files and this is what it looks like h2 connection I load everything into the browser now I'm a very good performance citizen I put good caching headers on this and this is what the repeat view looks like okay just the HTML which I chose not to cash in this case if I push here's what the the waterfall looks like if I push those first four assets are being pushed great I got some performance benefit from pushing them but the repeat view kind of looks like that which means I push them again so this is redundant data going over the wire so this is a problem the fact that those the browser cache isn't able to somehow communicate to me what I should be able to push is a problem now let's pretend that we have that situation tape let's pretend that we figured out what to push it's a problem I don't claim to have an answer for it but let's pretend we figured this out what are the actual practical use cases for push well I could think of three first and foremost is essentially push becomes a replacement for inlining inlining was a problem because when we inline assets into our HTML we lost the ability to cache those independently right we have to send it down with HTML well if I push those instead at the exact same performance profile but I can take those resources and now have it in the browser cache that's a plus that's a thing that helps me because I don't need to keep aligning those resources and I don't need to pay the penalty for fetching them that's the first use case the second use case somewhat similar is to push resources essential for this navigation for this page the page that I'm on very similar if you're familiar to the link rel preload mechanism which essentially initiates the browser preload to go fetch things before it can it's parsed the entire HTML in fact most servers use link rel preload headers as a signal to sit - what - for what to push so if you have those headers in you will probably signal your server to do a push there's a lot of discussion in the community now to actually add a second context here a second directive like a rel push to sort of separate the preload scenario from the push scenario but today most servers just use that as the hint ours does and I think nginx does as well this essentially saves you one round-trip and here's what the waterfalls look like before and after why is it one roundtrip because it's h2 because we can send things concurrently so when the browser sees that it needs them it's going to send all the request for those things that are that we would have pushed otherwise at the same time and that takes one round trip to get to the server so by pushing things we save one round-trip it's a performance saving we like to serve round save round trips but it's not as magnificent as maybe we would have thought what's more interesting and probably cooler is the idea of pushing things to the browser during server think time what server think time so everything time is the time it takes for a server to assemble an HTML for us or if you're using C DNS the difference from the edge to your origin you go through an edge server you got to go all the way to origin this is not necessarily close to your users and during that time you can do some pushy things to the to the browser I made an exaggerated waterfall for you to sort of see this in this concept at play this particular HTML takes 3 plus seconds to render on the server and during that time I would have had this like you know massive white space that I didn't do anything with before now I can use that space to push things down to the browser this is a kind of a good use case because I can take the time that it's taking for the server to render to push things now if you talk to me about HTML time I will do everything I can in my power to encourage you to cache your HTML at the edge in your CDN because that's ultimately better for you and better for your users but there are scenarios where you can't physically do that because of reasons that may be outside of your control and those cases pushes a good mechanism to get things to the browser before the server sends references to them so this advice has a very good blog post about this by the way that talks about this more I would highly suggest you guys go read that it is not a trivial thing to do because it's essentially an asynchronous operation two different things have to be happening at the same time I push needs to go to this client while a request is going to the server separating that stream isn't trivial with web servers so if you're interested in this talk your CDN because it's not a given that they support this there was also this notion early on that we can use push for next navigation for you know what we're gonna see in the next page although push can be used for that we already kind of have these good mechanisms called resource hints in place things like rel prefetch and pre-render that allow us to use idle time on the network to fetch or pre-render things that we think the users going to go to so even though we can use push for that there's already mechanisms in place and even when all that's figured out we still have our two big questions what do we push we still don't know Google did it pay published the paper it's actually a really nice paper it's not totally all the answers but it they did some studies and they have some like rules of thumb general rules of thumb it's better it's a better place to start than zero so I highly recommend you guys read that I'll make sure these slides are available afterwards cuz there's a lot of references in here and to answer the question of what do we do in case the thing is in the browser cache already we know that the mechanism of the reset stream isn't really gonna solve our problem so there's two things I know about that are going to try to address this one h2o the server I talked about has this mechanism called Casper which essentially uses a very smart cookie as a signal from the browser to the server about what the browser might have in its cache it's a bloom filter it calculates a bunch of stuff in there and that mechanism is actually the starting point for a new standard movement or a standard initiative called cache digests which is essentially the same idea a way for the browser to tell the server what it's got in its cache so the server can figure out what to push intelligently and not keep pushing things to the browser that it already has I think one of the most interesting cases with push is the one we haven't thought about because the mechanisms in place but it's possible that we can use it for creative things that are not necessarily page rendering things and Facebook just released the video a couple of days ago where they kind of did this and figured out interesting creative ways to use push so there's a link to it but you may have already seen it I would recommend that you look at it and sort of figure out what they do it won't probably match what you need from push but it may give you ideas obviously you see we're early in the lifecycle of this protocol is to have a lot of learning to do and the more we look at these things the more we publish these things the more we tell the world the better it is for us okay let's keep moving is this helpful am i bumming you out like is this is this okay or is this like a serious bummer okay works all right let's keep going let's talk about HVAC HVAC is the RFC 75 41 and addresses the big headers problem right we had big headers repetitive blow tea let's solve the problem there's two primary mechanisms in HVAC one is that all headers are Huffman encoded when they're sent from clients servers and backwards and the other way in headers frames and there are two tables that the two endpoints build and the idea is that instead of sending like user agent is bla bla bla bla bla bla bla bla you send index three and that's all you need to send and this is the way you get compression there are two tables there is a static table it is very well defined the definition will never change and unless I guess they iterate the RFC and it basically is the table that both sides start with and then there's a dynamic table that you build as you communicate and every new header value combination sort of becomes a new index into this dynamic table it's a FIFO table so things will get evicted from it if the size is overrun that's the dynamic table this is what the static table looks like so you can see that there's 61 entries in it some our header name and value combinations and some are just had our names and you can use this as a starting point to build new entries in the dynamic table now what are the things that this mechanism promised early on was because you have the word compression in what it's doing we assume that there's going to be performance benefits from it there kind of is but I kind of wanted to visualize it because it didn't click for me until I saw it so let's look at bandwidth usage from the client to the server when you're using H pack you see that there is significant bandwidth you savings when you use H pack this makes sense because most of the communication between a client and a server is requests that are headers so we started indexing those we started making requests that are this big into one byte obviously we're gonna see a benefit look what it looks like the other way this is what it looks like from the browser to the server this also makes sense because most of the data that we're getting from a server is payload right not headers so that kind of also makes sense this is not a revolutionary set of graphs but for me it sort of made a lot more sense when I saw it so I should be maybe expecting some bandwidth savings on the way to the server but I'm not really expecting that many performance benefits on the way back Dropbox published a blog post essentially to this effect they say they turn it on for an internal application and they had significant savings for their ingress bandwidth and you can see there's not much going on on the egress side some things to know about H pack that dynamic table defaults to 4k and turns out that no browser changes it so it stays at 4k look at this this is Twitter's content security policy header how big is it that big so half of the dynamic table gets taken up by one header when Twitter speaks h2 that means this thing is going to come out of that table because it's FIFO and it's gonna keep going back into the table and over time the benefits are probably going to be slightly less there is a proposal called a site wide header proposal that's going to address this and the idea there is don't keep sending even the same index as you keep going through communication with the server the things that are gonna maintain throughout the lifetime of the connection like user agent just send them once ever so these are like site wide headers particularly aimed at server to client headers because those are the places where it happens compression context that is per connection remember this this means that if you close the connection and open a new one even if it's h2 you lose the the entire dynamic table this makes if H pack is beneficial to you this makes keep alive timers that much more important so they're a bigger deal now because you're gonna lose H pack context and it's an attack vector and there's a paper released from Imperva that really attacks the crap out of it and breaks it in all sorts of different ways so it is a new attack vector one of the complications that comes with h2 is stuff like this you can't turn it off that's probably not a bad thing because without it you couldn't actually pipeline requests because your requests would essentially overflow the clients see when towards the server so the only way you can send 10 20 30 40 requests at once is if you index them essentially okay one more thing probably the most not most important one of the most important and definitely least underst mechanisms in h2 is prioritization which is a new mechanism that we didn't have before the idea is that with all these concurrency all this stuff happening at the same time there's gonna be contention so we have to have a framework by which a server can pick and choose what's more important than what else that's not what's more important and what's not so there's two mechanisms and priorities there is a weight which is essentially the priority of a stream and there's a dependency so you can build a tree essentially out of this stream weights and dependencies are communicated in two different mechanisms one is to a headers frame as you make a request you can assign a priority and a dependency to it or you can change that with a priority frame just on its own a priority frame just dictates priority and dependency and if the spec goes out of its way to say it's only a suggestion so it's basically a suggestion from the client to the server for ways to prioritize the server is not obligated to observe those priorities but it's pretty good suggestion I'm gonna show it to you with a couple of examples here's one actually some - from Ilya so I'm gonna give him credit it's from his book here's a simple one you have two streams a and B those are their weights and essentially everything is proportional so in this case stream a gets three times as many resources as stream B because it's you know 75% of the entire pool of resources here's example - this is dependency in this case the idea is that stream D gets everything and then after stream D is done you move to stream C here's one and two of them together in this case D gets everything after D is done C gets everything and after C is done three-quarters of the resources go to a and 1/4 the resources go to B and let's keep going and make it more complicated in this case he gets everything after D E and C get equal resources the fact that they're eight in this case doesn't really matter it's just the ratio to each other and after C is done which takes up half the resources that we have after D is done that half 75% of it goes to a and 25% of it goes to B that's sort of how this works it gets more interesting when you look at our browsers do it now it's really difficult to look at browser trees but somebody did it so I'm just gonna plagiarize and show it to you firefox does it really interestingly firefox starts with one two three or five streams that actually carry no requests they just sort of sit there as roots in a tree and then they have subordinate streams that carry the actual requests there this is a this is his name is Emoto he's from yahoo japan he's a great contributed to the h2 community so check out that presentation that's great this is basically stolen right out of it and what happens is as firefox build this tree each of those sort of branches get used for different types of assets so like javascript in the head it's down a different branch javascript in the body goes down a different branch images are in different place HTML is a different pace I have no idea why they chose to do this but it looks really smart like it looks like somebody put some thought into it but I have no idea why it's like this I just see it often chrome used to be like this no dependency but basically only priorities everything compared to each other dependent on the root node which is how everything starts it's changed since in chrome 53 here's what the tree looks like and I know that's really really hard to read so I'm gonna make it harder to read because that's the actual tree it looks like that that's for a page we have so Frederick who works at fastly on our h2 team put together a tool it's a really cool tool that lets you essentially take chrome internals paste it into a file and you get a picture out of it so if you're curious and want to play go play that's how this was generated what's weird about this tree is the extreme level of dependency created down the side here essentially what this means is that interleaving is dead down that side of the tree right because what this means is that each node can't be operated on until the node before it it's done means I can't do two things at once I got to do one after another which kills interleaving so I'm not sure what the motivation behind this is with chrome I just assume everything is just an iteration that they're figuring things out so I'm just gonna give them the benefit of the doubt but this is what it looks like and it's not always like this sometimes it changes and canary looks slightly different so I think they're in flux right now and it may not come as a surprise that Microsoft has no dependency and no priority everything is just flat this is over a year old and I'm going to hope that what they've done since is slightly different priorities are really really important because priorities are actually there they're probably one of the reasons that Firefox and Chrome are so different in those CDF graphs because they just do priorities differently and the server's aren't going to observe them differently okay let me just finish up by giving you some tools and resources things to study up on if you're curious in this which I'm assuming you are if you haven't read this book read it it's magnificent to start with but Ilya added an h2 chapter to it it's only like 25 pages extra onto the page and it's a fantastic overview of h2 there's an extension for Firefox and Chrome that essentially gives you a sign of what like what protocol you're speaking to the server I am addicted to this thing I can no longer not look at that little lightning thing any website that I speak to chrome dev tools has some cool tools including a protocol column which is new you got to add it it's not on my default so right click on the columns and get it there's a net internals panel that tells you what's happening with the individual connections that you have here you can actually see the frames go by which is really interesting it's the output of this that is the input to the priority graph that I showed you I can't have a talk without talking about Wireshark the greatest tool in the world because everything is TLS decrypting issue is very difficult but that link right there will tell you how to do it and that's what it kind of looks like on the wire curl now has H to support it I don't think it's a default curl that comes with the distributions there's a new version of curl but it has it it's there that's what it looks like and there's a there is a another tool that looks like curl called ng HTTP actually operates a lot like curl but gives you a lot more debug information about how you're communicating with a server and actually I think curl uses ng HTTP as a as its engine here's a bunch of others a lot of tools a lot of tools actually not enough but they're coming and there's actually the working group keeps a list of them okay let me finish up with this my intent was not to bum you out at all my intent was just to take a more practical look at the protocol and just not be so rah-rah about it there's parts that are good there's parts that are bad it's complicated it's probably better for us going forward we finally iterated on a protocol after 20 years and I guarantee you h3 is not gonna take 20 years so this is we're in a good place right now it's okay if h2 isn't perfect but it's up to us to sort of learn lessons give back to the community and make h3 even better the group that all the authors are hungry for data so anything you can do on your own that feeds back into the system is going to inform h3 you got to do that that's gonna be a great service we have a lot of learning to do and it is time for us to think about how we architect our apps to best leverage the features of h2 one of the reasons that it's not performing as well as probably because we're not built for this protocol let's build things for this protocol and I can't implore you enough to do these tests and publish your results so we can learn from it what I did is just one thing in an isolated lab what you have the real-world data is gonna be a lot more powerful do have time for questions like I have like five minutes stuff five minutes somebody give me a thumbs up know thumbs up five minutes any questions I gave you a lot of data like I really came at you hard right sorry there's so much that was 200 slides by the way yes yeah good question question is how different are the are the implementations so I've looked at h2o more than any other the implementation is pretty solid but it's still iterating my guess is the place that they differ the most is the way they handle priorities and that is an area that is utterly uncharted we have no idea what the right thing to do there is or we have no interface between the application and the protocol to dictate priorities right this is like sort of up to the browser's right now so my I'm relatively certain that you're gonna see lots of different implementations their core protocol with framing and things I think they're gonna be basically the same the only difference you may see is push that way they're handling push and that would be in like order of things like does the push promise show up or the right time probably does do the push does the push stream come when it's supposed to does it wait before the thing that references it or things like this so I think you will see some differences there because we're so early in the life cycle of this protocol I would expect more differences that you're probably comfortable with but that doesn't that's not necessarily a bad thing I think that's a good learning place for us anything else yes I I don't know about actually I I think I should proxy yes I think ELB ELB just didn't L be just announced H - I'm a son had an H - announcement a couple days ago cloud front had a announcement a couple days ago so now TLB cloud front has it most of the CD ends habit not all not all the seasons have it the load balancers are going to have it if they don't already and I'm pretty sure H a proxy has it if not it's gonna be very soon all those guys have it they should proxy has it nginx I know nginx has it and I know Apache has the module now yeah and the list is there so that you check out the Wikipedia page there's a big list of although I think else yes yeah so is h2 useful inside the data center is a good question so that's a good question and I think the answer has to do with what you surprised that was a good question was it a question but I think the answer has to do so I think it so I think of h2 as a browser protocol like that's the point at which I sort of came to terms with how weird it was I was like okay it's a browser protocol things need to happen at the same time but then I'm often wrong and somebody told me that somebody moved to h2 for their API calls because it reduced the connection load on their API endpoints and that sort of makes sense to me now for a single API call it makes no sense to use h2 but if there is a you know keep a live API client that's constantly making requests and the connection load is high on as API endpoint sure you've probably cut it down by six Eightfold and you know you have less connection load on that server so that kind of makes sense binary protocols are easier to process maybe this will be a lower overhead for your servers so I think the possibility is there I still think of it as a browser protocol but I don't think it's outside the realm of possibility that you're gonna find uses for it inside the inner I'm gonna be around for a while if you have any more questions come find me thank you for your time

Info

Channel: OmniTI

Views: 1,327

Rating: 5 out of 5

Keywords: OmniTI, surge 2016, Hooman Beheshti, HTTP/2, scalability

Id: CkFEoZwWbGQ

Channel Id: undefined

Length: 60min 33sec (3633 seconds)

Published: Thu Oct 20 2016