Make The Web Fast - The HAR Show: Capturing and Analyzing performance data with HTTP Archive format

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] MALE SPEAKER 1: Shanghai GDG is a very interesting developer community. FEMALE SPEAKER 1: I'm glad somebody has asked this question. MALE SPEAKER 2: This is where the magic happens. FEMALE SPEAKER 2:This is primarily a question and answer show, so if any of you out there would like to ask questions. ILYA GRIGORIK: Hello everyone, and welcome to our Make The Web Fast series here at Google Developers Live. Today, we'll be talking about the HTTP Archive data formats. And boy, do we have a loaded episode of tools, tips, and tricks for you. But before we get to that, my name is Ilya Grigorik. I'm a developer advocate for the Make The Web Fast team here at Google. PETER LUBBERS: I'm Peter Lubbers. I'm a programs manager in the Chrome Developer relations team, and really excited about a lot of the tools that-- or the ability that Chrome has to track a lot of these things to make the web faster. Today we're going to talk about this HTTP Archive format. There's a lot of cool features in here, probably a lot of things that you may not even have thought of. ILYA GRIGORIK: Yeah, exactly. So actually, Peter, yesterday we got together to do a quick run-through through all the different demos that we want to go through. And it took us what? An hour and a half or an hour and 40 minutes? PETER LUBBERS: And we discovered even some new stuff. ILYA GRIGORIK: So tons and tons of stuff. Just a little bit of logistics up front, we're going to do a lot of demos. So don't worry about trying to capture links or figure out which specific tool we're talking about. We'll actually share a URL at the end where you can find all of the resources. So just kind of sit back and watch and hopefully learn a few new things. Because we certainly learned a lot of new tools just through researching. And I think the reason for this is one of the most important things, in my opinion at least, for performance optimization, working on performance in general, is having good instrumentation. So a few weeks back, we were actually talking with Justin Cutroni from the Google Analytics team about how you can use navigation timing to capture performance data out of the browser for things like network timing, JavaScript, web browser performance, and all the rest. And today I think we're going to go a little bit deeper. We are going to look at a tool that a lot of us use, and we don't necessarily think about how the data underneath is structured and how we can reuse it in different ways. And the tool I'm talking about is packet sniffers or HTTP monitors. Now, you guys probably don't think about it that way because most of us are used to actually thinking about it as Chrome Developer Tools has a built-in tool which is a network panel, which is actually a packet sniffer, or an HTTP monitor. So I'm going to share my screen here for you guys. So I'm looking at the Google Developers Live page right here. And I'm going to open the Chrome Developer tools and just reload this page. So I have the network tab open. And if the demo gods are with us, the page will reload. There it goes. PETER LUBBERS: We're going to need their help today. ILYA GRIGORIK: Yes, exactly. So speaking of making the web fast, this page is taking a long, long time to load. Here we go. So now we have this network graph in here. And this is an invaluable tool for debugging what's happening with the site. So we can actually see that it took a half a second to connect to this page and all kinds of other data within this tool. So we can click on each resource. Now, this is very, very useful. But what if you could actually export this data? Or maybe to put it another way, what if we could actually take this data out of this tool and maybe import it into another tool? You can do a lot of interesting things with that type of data. PETER LUBBERS: Up until now, the only way to really do that was to take a screenshot. ILYA GRIGORIK: Right, yeah. PETER LUBBERS: Maybe annotate the screenshot, put a few arrows, like look at this one or look at the timing on this. That's obviously not a great way. ILYA GRIGORIK: So I've done exactly this, and this is terrible. I would find a problem. I'm like, oh, I need to email Peter about this. So I take a screenshot, annotate it with like, OK, here's our problem. And then I send it over, and then I get a question back like, OK, that's cool, but what was the status code? Or what were the headers? And I'm like, well-- PETER LUBBERS: Lost it. ILYA GRIGORIK: Grrrr. We lost all the data because we froze it in the screenshot. So it would be really nice if we could actually export this data with all of its fidelity, all of the data that's actually hidden within here, and then reuse it in a different way. So that's precisely what the HTTP Archive data format is for. So I'm going to show you guys this. We're not going to go into details of the spec, but there is a spec for it. So the HTTP Archive, the extension is HAR, so hence, the HAR show. Ha, ha. And the HAR data format itself is just a simple JSON schema, which contains all the metadata that you would need to reconstruct the network waterfall. So you can think of it as just an underlying data of the network pane in your Chrome Developer Tools or another HTTP monitoring tool. So you can see that it contains a lot of different data, like which browser, which pages you accessed. And entries are the individual requests that the browser makes for all of the resources on a page. So I actually have a live file here. So this is a long file. I'm not going to go through in detail. But I just want to show you guys how does this thing look. So what I was doing-- PETER LUBBERS: You're doing JSON format, right? ILYA GRIGORIK: Exactly. So it's very simple to create and consume, which is actually-- we'll see later-- is very, very important. So I'll just show you guys what this data actually looks like. So here I'm trying to access an archives page, so I typed in igvita.com/archives. So let me close that. Now we're looking at the entries. So we see the first entry, which is the actual request. We see that the browser is sending a get request for this actual page. And now, all of a sudden, look at this. You have all the header information. Every header that the browser appends is here, even cookies. I probably don't want to show you that. And then it has the response, which is, OK, so it's doing a redirect. And it will actually kick me out. So let me close this. And we can look at the next request. Here is the same page asking for a Google web font. So all of this performance timing data, all of the header data is all captured here, which is very, very convenient. PETER LUBBERS: Tell the audience a little bit more about the httparchive.org site, and then how that came about to really set the context here. ILYA GRIGORIK: A good example of how you could use this data is actually httparchive.org. So the HTTP Archive format and httparchive.org are actually two separate things. They just happen to share the same name, confusingly enough. But they actually have a common history. The idea behind HTTP Archive, you guys may be familiar with the Internet Archive, which continuously crawls the web and takes snapshots of certain pages. So actually, you can rewind history, and you can say, I want to see this page how it looked back in 2008. So think of HTTP Archive as a very similar tool, except that we don't actually care for what the pages looked like. We care about how do they perform. So things like let's capture how many JavaScript files were on a page, what was the total size of the page, how many images did you fetch, all of that metadata. And this is actually an important point for the HTTP Archive format. When you export it, by default, it won't export the body of the request. So if you fetch a 1-megabyte image, we're not going to include that. We just include the actual metadata. Because that's all we need for the waterfall. So what can you do this? Well, the idea behind HTTP Archive is that we can crawl a lot of sites. So HTTP Archive actually does about 100,000 sites right now based on the-- it elects the top, I guess, 100,000. And it aggregates all of these HTTP archives, so all of that metadata for all the network waterfalls, and then extracts meta-trends for things like are the pages growing in size, or what's happening. So let me show you this. We'll go to trends. And it'll take us one second here. And you can see that we're analyzing about 200,000 sites now, so it's growing. And we can see total transfer size and total request. So it's a little bit small. But an average page within those 200,000 sites today is over 1 megabyte in size and takes over 85 requests. That's shocking. I see this number all the time, but it still shocks me every time I see it. The HTML part itself takes 48 kilobytes, right? So all of this metadata is just aggregated between all the different runs-- PETER LUBBERS: From HAR files. ILYA GRIGORIK: From HAR files. PETER LUBBERS: Yeah, exactly. ILYA GRIGORIK: So this is a really good example of we've accumulated many of these files. We have them over time. And now we're just collapsing them together to create these formats-- sorry, not formats, to create these graphs to show you general trends on the web. And you can definitely see that the pages are growing in size, more requests, so on and so forth. PETER LUBBERS: Let me show you something in the Chrome Developer tools that you can use. Maybe some people are not aware of that yet. Let's open it up again and then reload the page. PETER LUBBERS: Now we get a network waterfall here. PETER LUBBERS: So once you get that-- so now you can actually just right click on the left side there. ILYA GRIGORIK: Just anywhere in here? PETER LUBBERS: Just anywhere, yeah. And notice the option here. We have actually two options. And you can also do this in Firefox, I believe. Copy entry as HAR. You can click on an individual file and save the HAR information. ILYA GRIGORIK: So if I clicked on style.css, it would just grab that? PETER LUBBERS: Just the style.css. And of course, typically I think you would be more interested in the entire page with all its resources. So, copy all as HAR would copy the HAR JSON directly to the clipboard. Now, if you want to save it as a file, there is an option there to save entry as HAR. And that just saves it as a file. So yeah, let's go ahead and do that at the moment. ILYA GRIGORIK: If I just copy this and if I go into my text editor and just create a new file here, so it's 5,000 lines of JSON. And this is basically all the metadata that's inside of this network waterfall. PETER LUBBERS: So with that data you could effectively reconstruct the network panel. We'll have an example of that later. But also you can pull out specific bits and pieces that you want. You want to see a specific-- the size of the request or the headers. ILYA GRIGORIK: Actually, this is worth pausing on. Because previously, I would just take a snapshot or a screenshot, annotate it, and send it over. Now, I can right click on this, copy, and maybe if I'm making a bug report, I can actually attach the entire trace. And then the developer has full metadata about everything that's happened in the browser. PETER LUBBERS: You could still send me the screenshot as well. ILYA GRIGORIK: OK, sure. PETER LUBBERS: But we can reconstruct the screenshot, effectively, with that data. So that's cool. ILYA GRIGORIK: So that's interesting. That actually brings up a good point. So OK, great, we took this very nice visual representation. I copied it into a 5,000 line JSON file which is very useful, but-- PETER LUBBERS: You've got to deal with it now. ILYA GRIGORIK: Right. If I get this 5,000 line JSON file, boy. PETER LUBBERS: You're going to need some visualization of that data. And fortunately, there's tons of tools that can help with this. Let's take a look at one. Probably the best one for that at the moment is the HAR viewer that you've got here. It's an open source project. You can just open it up on the-- ILYA GRIGORIK: So it looks like actually I can run it myself. So we can just embed it on any page. So it's PHP and JavaScript, which is really cool. But they also have an online demo, so we're going to try that. I actually have it preopened here. So this is-- what do I do here? PETER LUBBERS: Basically, remember how you copied the HAR information stuff? You could obviously-- actually, this supports drag and drop as well. You could take the HAR file and drag it onto this page. But for now, you have it in the clipboard already. So go ahead and paste it in there. ILYA GRIGORIK: So I'm going to paste the entire 5,000 lines of JSON in here. PETER LUBBERS: Keep your fingers crossed. And there we go. ILYA GRIGORIK: Wow. Look at that. PETER LUBBERS: Pretty informative as well. ILYA GRIGORIK: So that took no time. And look at this. Now, I have-- actually no, this is a little bit different. I think that's something else on my clipboard. PETER LUBBERS: Well, let's do it again. ILYA GRIGORIK: Well, it probably doesn't even matter. PETER LUBBERS: So now you get the whole visualization of all that data. ILYA GRIGORIK: Now that I have this JSON file, I can go to this thing, paste it in, and basically reconstruct exactly what happened in your browser. And the nice thing is, I still have all the fidelity of all of the header information, requests, the actual timing data. That actually gives you a little bit more than Chrome Developer Tools, things like these pie charts, where was the time spent. You can see that within this session, we fetched-- out of all the content that was fetched, more than half was images, which is actually pretty typical. So this is really nice. PETER LUBBERS: Then you can compare it with other loads of your page or other sites and see the trends over time. So actually, one thing I wanted point out, if you can just jump back a moment to the Chrome Developer Tools, one really cool thing that I particularly like-- if you, for example, refresh this page. And let's go ahead and do that a moment. So refresh the page. And you notice how in the network panel it will reload everything. Right now, it sort of wiped out all of the previous information, and it's got the new-- and that's typically what you want. But now if you want to actually track-- navigating from page to page to page, you can hit the Record button on the bottom. ILYA GRIGORIK: I've often wondered what that thing was for. PETER LUBBERS: So click that. And then why don't you go to the trends page or stats. ILYA GRIGORIK: I think I'm on the trends now, so I'll go to Stats. PETER LUBBERS: Click on Stats. And now let's click on websites. ILYA GRIGORIK: Sure. PETER LUBBERS: And why don't we just go finish it off with the About page. So basically what's happening is the information is not thrown away. It's just added on. ILYA GRIGORIK: Let me stop the recording here for a second. And now look at this timeline. It actually says 50 seconds. It's not that any one of those pages took 50 seconds. It's just that the timeline is appended across all the different sessions. PETER LUBBERS: It's a complete trace of what you're renewing. ILYA GRIGORIK: That's really cool. So now I can actually do a thing like come to the Home page, press Record, go to Login page, go to the Checkout cart, and record that entire trace. PETER LUBBERS: So now actually, you can go ahead and do it in a moment. There's an interesting thing at the viewer. ILYA GRIGORIK: OK, so let me copy this. So this is probably like 50,000 lines of JSON. And go to the-- so let me hit Back. PETER LUBBERS: Back to the Home page. ILYA GRIGORIK: Here's our online demo. HAR Viewer online. Close some of these tabs. PETER LUBBERS: So we're going to put it in there. And the nice thing is, if we hit the Preview there, you'll see actually four pages. And you can actually in the top part, even if just hover over them, you see a lot of detail already. ILYA GRIGORIK: So then within the session, we just recorded four different pages. And then for each page, we can look at these pie charts. And we also have the full waterfall chart for each one. And this is just spread over the entire session. PETER LUBBERS: That's pretty awesome. Also, what I like about it is, you can actually have a site that has multiple pages. You can really see the outliers. You can say, oh, all of them are about the same, but this one is way out there. What's happening to that? You can zoom in, and you have some context. ILYA GRIGORIK: So you know what? Now that I see this, what I want this a record button in Chrome, where I can actually tell my users to just say, oh, you're having a problem? Hit Record. Go through a couple pages. Now stop and sent me that trace. That would be really cool. PETER LUBBERS: That would be cool. ILYA GRIGORIK: There's an extension in the making right after this show. OK, that's really cool. So what's next on our agenda? PETER LUBBERS: We talked about the viewer. Let's talk about some other tools. And one of the nice things-- it's not an actual official standard or specification. I mean, there is a spec. But it's sort of the de-facto standard, if you will, for a lot of this information. Not just Chrome, but Firefox supports it. But there's a lot of other tools. And so maybe you could just go to the HAR adopter's page. ILYA GRIGORIK: Right, the "har" HAR adopter's page. So it turns out Chrome supports HAR, as we just saw. But it turns out that most other tools that you guys are familiar with also have a HAR export. In fact, the whole point for creating the standard was originally when Firebug and NetExport or HttpWatch were trying to figure out, how do we come up with a single standard so we don't end up creating different export files? So this is probably three or four years back. So, obviously, Firebug and HttpWatch support this. But if you look through this, IE, Chrome Developer Tools, we just saw HAR Viewer, and there's probably three dozen different tools in here. And so Charles Proxy is another very popular tool. So this is actually an important point. This is not just in a browser. So if I have a proxy running and I configure my browser to go through it, I can still capture all the same data, which is really nice. PETER LUBBERS: This will come in very handy, for example, with mobile browsers. ILYA GRIGORIK: And we'll actually see an example of that. And then I guess the other thing to mention is the format itself is very flexible, JSON, which is nice. Because most every language out there today has a very good JSON library, because it just maps very easily to your basic data types. But if you want, there are wrappers in each language. So things like the Java HarLib, where you just give it the HAR file, and it gives you a nice object back which you can iterate over. There's Perl. There's Ruby. I found a whole bunch of different tools. PETER LUBBERS: It's pretty amazing the amount of support for this. Actually, just over the weekend, we got a mail from Eric Duran from New York who built this great little viewer, which I thought it was awesome, because this is actually just a-- ILYA GRIGORIK: This looks very familiar. This looks like Chrome Developer Tools. PETER LUBBERS: It really does. He sort of re-implemented all of that with drag-and-drop support. ILYA GRIGORIK: So this is a page, right? PETER LUBBERS: Right. ILYA GRIGORIK: We're looking at. github.com/chromeHAR. So drag and drop a HAR file. OK, so I happen to have a HAR file. I've come prepared. PETER LUBBERS: Isn't that awesome? ILYA GRIGORIK: Look at that. It's the same tools that we know and love. I can click on this. You can see all the headers, the cookies, the timing, and this is just a web app. So it's really cool. I think this is actually a good demo of something that not many people realize, which is Chrome Developer Tools is a web app. So I think what Eric is doing here is he's actually pulling out a lot of the styling and perhaps even the JavaScript logic. PETER LUBBERS: I think so. I haven't completely had enough time to look at it yet, but it's pretty. ILYA GRIGORIK: That's a pretty cool demo. So if you guys want, there's a HAR Viewer. Now there's this. And it looks like he's even thinking about adding some PHP functionality. PETER LUBBERS: We'll come back to that one as well. ILYA GRIGORIK: Unfortunately, I can't click on it right now, but-- PETER LUBBERS: So one of the things about all of this is, of course, getting kind of already a little bit tired of the manual parts of this. Let's focus on that a bit. Because if you have to actually download this HAR file that's great. But if you have to do this more often, like, for example, if you want to start seeing trends over time, then it's going to be a little bit problematic. You're going to see-- you're constantly doing this manual work. So what tools are there, Ilya, for automating this process? ILYA GRIGORIK: Well, short of getting an army of monkeys, the other way to do it is-- so actually, there's a lot of different ways. And I'm actually really excited about this stuff. Let me show you some examples. So first of all, I don't know if you guys are familiar with PhantomJS. PETER LUBBERS: That's the headless router? ILYA GRIGORIK: Yeah, exactly. It's a headless WebKit browser. And the cool thing about it-- well, there's so many cool things about it. We could probably do an entire show on it. But one of things I like about it is it's very easy to download and install. It's a statically compiled binary, so you just download this zip file. You unpack it, and it's basically a browser, a full-featured WebKit browser, which means it can execute JavaScript. It'll download your CSS images, everything, which is nice. And you can run it from the command line. So this is what I have here. I just downloaded the actual file. I unpacked it. I just have it sitting here. Now I can do "bin phantom," and if you look at this-- so it just gives you a little bit of help information. You can configure it any way you like. But the way it basically works is you need to give Phantom a script to tell it what to do. So think of it as this is a browser, and we need to tell it, hey, I want you to open a file or a URL and do something there. So It comes with a couple of different example scripts, one of which is NetSniff. I think you can guess what it's going to do. It's going to sniff on the network traffic and log that data. Let me do this. I'll actually download a page here. And if all goes well, this should run Phantom. And look, it spit out what looks like a JSON file. In fact, it looks like a HAR file. Surprise! PETER LUBBERS: Very surprised. ILYA GRIGORIK: So let me actually just to save that to an out file instead of just printing it to screen. So we're looking at this. Once again, we have all the JSON data. So what just happened? I think this is really important. We have a fully-featured browser that went in and captured all the network data. Now imagine if you take-- PETER LUBBERS: But you run it on the command line. ILYA GRIGORIK: Right. So imagine now you take this, and you have your CI build running on your-- PETER LUBBERS: Exactly. ILYA GRIGORIK: --instance. And after each check-in, you run this and you captured a HAR file. Now you have the full timeline. On every check-in, you have a history of what happens. Maybe you added more resources and all the rest. PETER LUBBERS: That actually is interesting, because if you do that and you combine it with running that through analysis tools like PageSpeed, YSlow, is that already supported? ILYA GRIGORIK: You guys can see that we practiced this yesterday. So here's another tool that you'll love. So if you guys use Node, there's actually-- YSlow is available as a Node module. You can just run it. I already have it installed on my machine. But if you run NPM, install YSlow, that's what you're going to get. Let me just-- so once you install it, you just get this YSlow binary. It gives you a couple different examples here. I'm just going to copy the guy right here. I think there's actually a typo in the Readme. So we need two double dashes. And guess what it takes? PETER LUBBERS: A HAR file. ILYA GRIGORIK: It takes the HAR file. PETER LUBBERS: It's getting old. ILYA GRIGORIK: Right. So what we're going to do is we're just going to print out-- actually, let me change this for one second. We're going to print out a basic summary. And format plain is just plain text. And feed it this HAR file. And it analyzes the HAR file, and says OK, there's 80 kilobytes of data that we downloaded here. The score according to all of the YSlow rules is 100. Hooray. And there's a total of seven requests for this. So that's kind of interesting. Now let's dig in a little bit deeper. Let's run all. And if you run all, all of a sudden, you get a ton more output. So what happens here is, YSlow has a set of rules or categories of rules for things like, are you using a CDN? Are you setting expire setters? Are you compressing your data on all these things? As you can see, it's actually scored that entire HAR file with respect to all those rules. And it's giving you a specific score. So it just so happens that this page is actually fairly optimized. There's no offenders in the sense that if there was a file that wasn't compressed, it would show up in here. So now we have two things. We've captured the HAR file from command line. We ran it through YSlow. And, of course, you don't actually have to print this out-- PETER LUBBERS: All from the command line. ILYA GRIGORIK: Right. And you don't have to print this out in plain text. To feed into your own tool, you have JSON output. So now, literally in two lines of batch script, you can get access to all the performance data. So now in every CI check-in, I can run this and raise an alarm if all of a sudden-- PETER LUBBERS: Yeah, exactly. You can set alarms for when you drop a certain score, or like, for example, images that are not compressed or it thinks that it picks up the typical YSlow rules. And then that can trigger a whole chain of alarms and ways to fix it. That's awesome. ILYA GRIGORIK: Exactly. PETER LUBBERS: So is that also supported in PageSpeed? ILYA GRIGORIK: It is. So it will require a little bit more work. But if you go to the PageSpeed site-- so I'm just looking at the PageSpeed SDK. You can actually download the entire SDK. It's an open source project. And if you build the SDK, you'll actually get this binary called-- guess what? HAR to PageSpeed. And you feed it a HAR file, and you get a similar output, but perhaps slightly different rules and slightly different waitings as YSlow. If you're a PageSpeed fan, you can also use this. And by the way, PageSpeed SDK also comes with some really awesome tools like the PageSpeed Optimize Image. Nicely named, very descriptive. So you just pass it an image file, and it'll automatically pick out the right format and optimize the stuff for you. That's something you can also do as part of your build process. PETER LUBBERS: So I think when I first started looking the HAR format and everything, I really thought of it primarily from a browser perspective. And actually, after looking at this more, obviously it's the browser making that HTTP request to the server and the server responding. So when you start thinking about it that way, it's like, OK, we can actually do a lot more with it. We can actually go beyond the browser, if you will, and start implementing HAR on the server side. ILYA GRIGORIK: That's a cool idea. PETER LUBBERS: I don't think there's a whole lot of tools around that yet. But it's the same format. ILYA GRIGORIK: So just to elaborate this a little bit, a lot of API servers or app servers that you build kind of follow the same pattern. A request comes in then you have a dispatch to a database or maybe another HTTP server that does something. And it's the same waterfall, basically. PETER LUBBERS: But on the inside. ILYA GRIGORIK: So it would be kind of cool if we could visualize the same data. And we saw a whole number of different rules-- or tools, rather, that we can use for this kind of stuff. PETER LUBBERS: So let's take a look at a sample file that you put together already for this, just to get the idea of what's happening here. ILYA GRIGORIK: So I've created this mock file, and how that's created is a separate story. You can instrument your own app server. Or you can actually use some other tricks which we'll talk about in a second. PETER LUBBERS: Some sort of logging. ILYA GRIGORIK: Exactly. So actually, I'm going to sneak in another demo. So if you guys are working with HAR files, and you have them on disk, there is this-- and you have Ruby installed, which you probably do, you can actually do "gem install har," which creates this very, very handy utility. What you do is then you say "har" and it installs this kind of dense script, and you pass it a file. And check this out. So I'm going to run this. And it starts a local server, which embeds the HAR viewer and fires up a new tab on your thing and just visualizes it right here. How awesome was that? PETER LUBBERS: Wow. ILYA GRIGORIK: So we don't even have to copy it, go to the site. You just kind of like feed it a file and there you go. So next time somebody emails you or you have a bug report with a HAR file, download it, run it with this, you're up and running. So let's look at this. What's happening here? This is a simple request where we're kind of simulating in let's say an RSS feed application, where a request comes in for this feed ABC. And the next thing that happens is we dispatch a search request for ABC, which is basically like, what are the articles that I should show? What are the article IDs? Let's say that returns a couple different IDs, three. So that request takes some time, 136 milliseconds, and we have all of the header information on all the other metadata from our API server. And then we, in parallel, dispatch three requests to fetch the articles, and then the response is returned. PETER LUBBERS: Yeah, you can really trace right into that and see where you're taking most of the time. ILYA GRIGORIK: So now you can think about instrumenting your app server and reusing the same tools. This is great visualization. PETER LUBBERS: Yeah, by sticking to the format, you really get a lot of benefits. ILYA GRIGORIK: Yeah, there's so many different tools out there. And speaking of tools-- so now let's take this use case. So we've captured a HAR file, let's say, on the command line. We can run it through YSlow, which is very good for capturing, let's say, regressions or anomalies if somebody checks in, So it'll ingest this data. But now it gets smarter, which is to say, it looks at the URL which you were accessing, and it says, hey, you've actually uploaded not one, but three traces of this file, spread across time. So what I'm going to do is I'm going to visualize the difference in page loading time or total size. So we actually have-- as you can see here, we have three runs. I was playing with this on the weekend. And between these three different runs-- so the yellow line here is the full load time. So the load time went up from 0.75 seconds to about 1 second between the three runs. This graph is also very interactive, which I didn't realize for a while. For example, I can disable this and load things like the onLoad time and time to first byte. Imagine having this data for your site across two weeks, and now you can go in and look at, well, am I just putting more images? Or why did for my load time get worse? And then you can also--- if you need a nice artifact for your next presentation, you can also just save that as a PNG image, or an SVG. So that's cool. That allows you to trend it over time. And within here, for each specific run, you, of course, still have access to all of the metadata within each one. Domains from which we fetched all the resources, it even embeds PageSpeed scoring. So it says, well, for this page-- let's see, cache validator. So some of the resources didn't have a cache validator. So it gave me an 83. OK, interesting. The HAR Viewer, as you would expect, embedded in here, and you can toggle between all the different runs. So here's a little tip. We've talked about those any number of times now, but you've captured on CLI. You run it through YSlow, and then you can push it into here. And all of a sudden, you have a performance monitoring solution-- PETER LUBBERS: Exactly. ILYA GRIGORIK: --in like three lines of batch script. And if you guys are interested, I actually found this example on the HAR storage Wiki. I uploaded the file manually, but you can just-- here's four lines of Python where you just encode the data. You push it in. And that's it. You're done. PETER LUBBERS: We've looked at it from the browser perspective now. We've looked at it from the server side in the automation. And one thing we talked about briefly is the mobile support. Obviously, you don't always have Chrome Developer Tools on your mobile device. So how would you use some of these tools to set that up? ILYA GRIGORIK: So I actually made-- I cheated. I made us a little presentation there, two slides. You bring up a really good point, which is if you're using Chrome on Android, you actually have remote debugging, which is absolutely awesome. Because you have access to the same network panel plus all of the other-- even JavaScript debugging. But what happens when you're running an older or another browser which doesn't have that capability? PETER LUBBERS: And just for those of you attending, we did do a show on-- the Chrome Mobile show just last week on Chrome Mobile debugging. ILYA GRIGORIK: It's an awesome tool. If you guys haven't used it before, we definitely recommend it. But here we're talking about a slightly different use case, which is let's say I have an older phone or a phone with a browser that doesn't support this kind of thing. Could I still get access to this data? It's actually a little bit tricky if you think about it. How do you do that? So there's a trick. I made a diagram just to explain it. PETER LUBBERS: It involves a proxy server. ILYA GRIGORIK: Yes, it involves a proxy server. So here's the trick. You have your phone. And I will assume that your phone can connect to a Wi-Fi hotspot. That's a requirement, unfortunately. If you can't do that, then you can't use this trick. But let's assume it does. So what we do is we take our laptop, and we actually start a Wi-Fi hotspot on it, so it becomes-- starts to broadcast. We then connect our phone to the laptop. And now if I'm browsing on my phone, I'm actually going through my laptop. So far, so good? So what's going to happen is when I make a request in my mobile browser, it'll go to my laptop. My laptop will go to the server and just kind of funnel data back and forth. Now, given that the data is flowing through the laptop, we can actually capture the data with a low-level tool like a TCP dump, or Wireshark. PETER LUBBERS: Exactly. I was going to say with Wireshark, you can capture everything. ILYA GRIGORIK: We can just say capture on this interface, or this specific port, or this IP, all that kind of stuff. So you run that capture, and what you get out of it is a PCAP file, which is just a very low-level-- here are the IP packets and TCP packets that are flowing over the wire. So this is nothing that you would consume without an additional tool. PETER LUBBERS: It's not in a HAR format. ILYA GRIGORIK: No. Unfortunately not. I actually have a file that I'll show you guys. I have a sample Wikipedia file where I captured a PCAP file. If I just open this in Vim, it's gibberish. Because it's a binary format, not anything too interesting. Now it turns out that there is actually a tool called PCAP to HAR, which will take those IP packets and basically reconstruct-- or TCP packets, and reconstruct the entire flow and create a HAR file. Actually, let me see if I can find this tool here. You can use PCAP to HAR to manually do this. So you capture the PCAP file. And then you get a HAR file out of it, and then you can use the HAR Ruby gem to visualize it. Or there's this web app which allows you to just upload a PCAP file. And it'll do the encoding, and it will just show you the thing. Let me show you this. So I have my Wikipedia HAR file here. PETER LUBBERS: No, it's a PCAP file. ILYA GRIGORIK: Sorry, PCAP file. And I'm just going to hit Upload. And so we're doing is we're uploading the raw binary data. It's going to run the transform to HAR. And look at that. Now we're looking at are the waterfall chart as captured from a mobile phone that perhaps you couldn't configure a proxy server on it or using some browser that doesn't support remote debugging. PETER LUBBERS: Can you grab the HAR-- the raw data on the other tab there? The HAR JSON, yeah, OK. ILYA GRIGORIK: You can also explore it, or you can click on the Download HAR File, and there you go. PETER LUBBERS: Full circle. ILYA GRIGORIK: So this tool-- and we're only scratching the surface, here. We just spent what? 40 minutes talking about all the different ways we can use this. But I think we've only scratched the surface. Because we can use it on the server. We can use it to automate performance monitoring. I think instrumentation is key. So having tools like-- combining tools like HAR Storage, HAR Viewer and others, you can literally build a performance monitoring dashboard for your site or for your company in a couple of hours. PETER LUBBERS: Yeah, because there's so many tools around it. So let's see if we have any other questions. Oh, first of all, the links. ILYA GRIGORIK: We covered a lot of stuff in here. So I actually created a quick gist of a whole bunch of different links for some of the tools that we covered. I tried to capture them all here. So if you guys go to bit.ly/har-show, you'll find all the links in there. And we'll actually also push out a blog post later today with a little bit more information on it. PETER LUBBERS: Put it on their Chrome Developers, Google+ page and link to it. ILYA GRIGORIK: But I would definitely encourage you guys to just explore it, play with the HAR viewer. Definitely try Phantom and see what you can do with it. Actually, one quick note on Phantom. When you download the file, it comes with example files. I showed you the NetSniff. There's actually a small bug in the NetSniff file, which I fixed. And you guys should go to the GitHub page for Phantom and just copy the latest NetSniff file. So it's there, it's fixed. It's just not on the latest release, so just FYI. PETER LUBBERS: We'll make a note of it on here. Let's take a look if there's any questions on the-- OK, there's a couple. ILYA GRIGORIK: So we got a question from Steve Sauders. Wow, I think I know that guy. "Does a HAR file contain the response bodies?" So by default, when you export out of, let's say, Chrome Developer Tools, it does not contain the response body. But there is no reason why it can't. So if you're writing your own tool, so for example, if you're scripting Phantom or something like that, no reason why you can't include it. And I think would be really cool if, for example, HTTP Archive actually stored the bodies-- sorry, httparchive.org, the site. And then you could do more interesting analysis over time. PETER LUBBERS: Sure. They would require a lot more storage space. ILYA GRIGORIK: Yes, absolutely. PETER LUBBERS: I think that's why it's not there by default. ILYA GRIGORIK: Yes. Well, exactly. PETER LUBBERS: "What about compressing HAR files?" Interesting. ILYA GRIGORIK:Well, yeah, as we saw, we exported one trace, 5,000 lines of JSON. If you're storing these things over time, you probably want to compress them, and they will compress incredibly well. Coming out of Chrome Dev Tools or other tools, you're just going to get the raw file, the raw JSON. It's up to you if you want to store it, to archive it-- or, sorry, to compress it. PETER LUBBERS: So "can the HAR format be extended to support information about requests that don't make it to the server, but instead hit the cache?" Interesting. ILYA GRIGORIK: Actually, if you capture the-- if you export out of Chrome Dev Tools, it will contain that data. If you load your developer tools, it does show requests that are coming from the cache. It actually indicates that. And it will be there in the exported HAR data. And the reason they-- usually the quick way to spot it is when you look at those chunks of JSON, you will see that some requests don't have any HTTP headers. That's a giveaway that this came out of the cache. PETER LUBBERS: A couple more. ILYA GRIGORIK: So "BrowserMob proxy will capture and produce a HAR file. And Charles can also be used as a proxy to generate HAR." right, so I think this is more of a note from Andy, which is a really good point, which is we talked about exporting this out of a browser. But you can use a tool like Charles Proxy, or Fiddler, or something else where it's a standalone app. So I could actually run a proxy server on my laptop, and then connect from this laptop here to just proxy everything through it, and capture this data, capture the HAR files on this device. So that's a really good point. PETER LUBBERS: OK, one more. "When you were looking at the HAR viewer, a JPEG appeared as the first request rather than the HTML page. Why does this happen?" ILYA GRIGORIK: That's a good question. That could be a bug. But every request that is made has a timestamp within the actual export. So it'll say when the request started, which is how we determine in Dev Tools where in the timeline it should live. So I'm not sure if the HAR spec specifically says that it must be sorted in order. Maybe it does, maybe it doesn't. But you can easily resort that data based on the timestamp and get the exact timeline. Cool. So I think that that covers it. PETER LUBBERS: That was great. ILYA GRIGORIK: This is definitely a power tool. Lots and lots of stuff that you can do with it. And I think, once again, instrumentation is key for anything to do with performance. PETER LUBBERS: And all these tools, there's so much support for it that this is really the way to go for all of this-- ILYA GRIGORIK: Yeah, so I definitely encourage you guys to play with it. PETER LUBBERS: So we'll post the links. We'll post it on the Chrome Developer Google+ page. You'll be posting your blog pretty soon? ILYA GRIGORIK: I'll have a blog post up soon as well, just documenting some of the examples here. And then maybe just one quick note. I think we're done with this. But in our next episode, we're actually going to take a look at Google web fonts, which is going to be a really fun topic. I love web fonts. A lot of people have issues with web fonts when it comes to performance. So we're going to do a deep dive on what it takes to make web fonts fast and what Google Web Fonts specifically does to make web fonts fast. PETER LUBBERS: And that's in two weeks? Same time, right? ILYA GRIGORIK: Yep, two weeks. So yeah, Tuesday. PETER LUBBERS: We'll announce it. Excellent. ILYA GRIGORIK: Awesome. Thank you guys. PETER LUBBERS: Thanks a lot. [MUSIC PLAYING]
Info
Channel: Google for Developers
Views: 33,538
Rating: undefined out of 5
Keywords: chrome, gdl, makethewebfaster
Id: FmsLJHikRf8
Channel Id: undefined
Length: 43min 55sec (2635 seconds)
Published: Tue Aug 28 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.