[MUSIC PLAYING] MALE SPEAKER 1: Shanghai GDG
is a very interesting developer community. FEMALE SPEAKER 1: I'm
glad somebody has asked this question. MALE SPEAKER 2: This is where
the magic happens. FEMALE SPEAKER 2:This is
primarily a question and answer show, so if any
of you out there would like to ask questions. ILYA GRIGORIK: Hello everyone,
and welcome to our Make The Web Fast series here at Google
Developers Live. Today, we'll be talking about
the HTTP Archive data formats. And boy, do we have a loaded
episode of tools, tips, and tricks for you. But before we get to that,
my name is Ilya Grigorik. I'm a developer advocate for
the Make The Web Fast team here at Google. PETER LUBBERS: I'm
Peter Lubbers. I'm a programs manager in the
Chrome Developer relations team, and really excited about
a lot of the tools that-- or the ability that Chrome has
to track a lot of these things to make the web faster. Today we're going to
talk about this HTTP Archive format. There's a lot of cool features
in here, probably a lot of things that you may not
even have thought of. ILYA GRIGORIK: Yeah, exactly. So actually, Peter, yesterday we
got together to do a quick run-through through all the
different demos that we want to go through. And it took us what? An hour and a half or an
hour and 40 minutes? PETER LUBBERS: And
we discovered even some new stuff. ILYA GRIGORIK: So tons
and tons of stuff. Just a little bit of logistics
up front, we're going to do a lot of demos. So don't worry about trying to
capture links or figure out which specific tool we're
talking about. We'll actually share a URL at
the end where you can find all of the resources. So just kind of sit back and
watch and hopefully learn a few new things. Because we certainly learned a
lot of new tools just through researching. And I think the reason for
this is one of the most important things, in my
opinion at least, for performance optimization,
working on performance in general, is having good
instrumentation. So a few weeks back, we were
actually talking with Justin Cutroni from the Google
Analytics team about how you can use navigation timing to
capture performance data out of the browser for things like
network timing, JavaScript, web browser performance,
and all the rest. And today I think we're going
to go a little bit deeper. We are going to look at a tool
that a lot of us use, and we don't necessarily think about
how the data underneath is structured and how we can reuse
it in different ways. And the tool I'm talking
about is packet sniffers or HTTP monitors. Now, you guys probably don't
think about it that way because most of us are used to
actually thinking about it as Chrome Developer Tools has a
built-in tool which is a network panel, which is actually
a packet sniffer, or an HTTP monitor. So I'm going to share my screen
here for you guys. So I'm looking at the
Google Developers Live page right here. And I'm going to open the Chrome
Developer tools and just reload this page. So I have the network
tab open. And if the demo gods are with
us, the page will reload. There it goes. PETER LUBBERS: We're going
to need their help today. ILYA GRIGORIK: Yes, exactly. So speaking of making the web
fast, this page is taking a long, long time to load. Here we go. So now we have this network
graph in here. And this is an invaluable tool
for debugging what's happening with the site. So we can actually see that
it took a half a second to connect to this page and
all kinds of other data within this tool. So we can click on
each resource. Now, this is very,
very useful. But what if you could actually
export this data? Or maybe to put it another way,
what if we could actually take this data out of this tool
and maybe import it into another tool? You can do a lot of interesting
things with that type of data. PETER LUBBERS: Up until now, the
only way to really do that was to take a screenshot. ILYA GRIGORIK: Right, yeah. PETER LUBBERS: Maybe annotate
the screenshot, put a few arrows, like look at this one or
look at the timing on this. That's obviously not
a great way. ILYA GRIGORIK: So I've
done exactly this, and this is terrible. I would find a problem. I'm like, oh, I need to email
Peter about this. So I take a screenshot, annotate
it with like, OK, here's our problem. And then I send it over, and
then I get a question back like, OK, that's cool, but
what was the status code? Or what were the headers? And I'm like, well-- PETER LUBBERS: Lost it. ILYA GRIGORIK: Grrrr. We lost all the data because we
froze it in the screenshot. So it would be really nice if we
could actually export this data with all of its fidelity,
all of the data that's actually hidden within here,
and then reuse it in a different way. So that's precisely what
the HTTP Archive data format is for. So I'm going to show
you guys this. We're not going to go into
details of the spec, but there is a spec for it. So the HTTP Archive, the
extension is HAR, so hence, the HAR show. Ha, ha. And the HAR data format itself
is just a simple JSON schema, which contains all the metadata
that you would need to reconstruct the network
waterfall. So you can think of it as just
an underlying data of the network pane in your Chrome
Developer Tools or another HTTP monitoring tool. So you can see that it contains
a lot of different data, like which browser, which
pages you accessed. And entries are the individual
requests that the browser makes for all of the resources
on a page. So I actually have
a live file here. So this is a long file. I'm not going to go
through in detail. But I just want to show you guys
how does this thing look. So what I was doing-- PETER LUBBERS: You're doing
JSON format, right? ILYA GRIGORIK: Exactly. So it's very simple
to create and consume, which is actually-- we'll see later-- is very,
very important. So I'll just show you
guys what this data actually looks like. So here I'm trying to access an
archives page, so I typed in igvita.com/archives. So let me close that. Now we're looking
at the entries. So we see the first entry, which
is the actual request. We see that the browser is
sending a get request for this actual page. And now, all of a sudden,
look at this. You have all the header
information. Every header that the browser
appends is here, even cookies. I probably don't want
to show you that. And then it has the response,
which is, OK, so it's doing a redirect. And it will actually
kick me out. So let me close this. And we can look at
the next request. Here is the same page asking
for a Google web font. So all of this performance
timing data, all of the header data is all captured here,
which is very, very convenient. PETER LUBBERS: Tell the audience
a little bit more about the httparchive.org site,
and then how that came about to really set
the context here. ILYA GRIGORIK: A good example of
how you could use this data is actually httparchive.org. So the HTTP Archive format and
httparchive.org are actually two separate things. They just happen to share the
same name, confusingly enough. But they actually have
a common history. The idea behind HTTP Archive,
you guys may be familiar with the Internet Archive, which
continuously crawls the web and takes snapshots
of certain pages. So actually, you can rewind
history, and you can say, I want to see this page how
it looked back in 2008. So think of HTTP Archive as a
very similar tool, except that we don't actually care for what
the pages looked like. We care about how
do they perform. So things like let's capture how
many JavaScript files were on a page, what was the total
size of the page, how many images did you fetch, all
of that metadata. And this is actually an
important point for the HTTP Archive format. When you export it, by default,
it won't export the body of the request. So if you fetch a 1-megabyte
image, we're not going to include that. We just include the
actual metadata. Because that's all we need
for the waterfall. So what can you do this? Well, the idea behind HTTP
Archive is that we can crawl a lot of sites. So HTTP Archive actually does
about 100,000 sites right now based on the-- it elects the
top, I guess, 100,000. And it aggregates all of these
HTTP archives, so all of that metadata for all the network
waterfalls, and then extracts meta-trends for things like are
the pages growing in size, or what's happening. So let me show you this. We'll go to trends. And it'll take us
one second here. And you can see that we're
analyzing about 200,000 sites now, so it's growing. And we can see total transfer
size and total request. So it's a little bit small. But an average page within those
200,000 sites today is over 1 megabyte in size and
takes over 85 requests. That's shocking. I see this number all the time,
but it still shocks me every time I see it. The HTML part itself takes
48 kilobytes, right? So all of this metadata is just
aggregated between all the different runs-- PETER LUBBERS: From HAR files. ILYA GRIGORIK: From HAR files. PETER LUBBERS: Yeah, exactly. ILYA GRIGORIK: So this is a
really good example of we've accumulated many
of these files. We have them over time. And now we're just collapsing
them together to create these formats-- sorry, not formats, to create
these graphs to show you general trends on the web. And you can definitely see that
the pages are growing in size, more requests,
so on and so forth. PETER LUBBERS: Let me show you
something in the Chrome Developer tools that
you can use. Maybe some people are not
aware of that yet. Let's open it up again and
then reload the page. PETER LUBBERS: Now we get a
network waterfall here. PETER LUBBERS: So once
you get that-- so now you can actually
just right click on the left side there. ILYA GRIGORIK: Just
anywhere in here? PETER LUBBERS: Just
anywhere, yeah. And notice the option here. We have actually two options. And you can also do this
in Firefox, I believe. Copy entry as HAR. You can click on an individual
file and save the HAR information. ILYA GRIGORIK: So if I clicked
on style.css, it would just grab that? PETER LUBBERS: Just
the style.css. And of course, typically I
think you would be more interested in the entire page
with all its resources. So, copy all as HAR would
copy the HAR JSON directly to the clipboard. Now, if you want to save it as
a file, there is an option there to save entry as HAR. And that just saves
it as a file. So yeah, let's go ahead and
do that at the moment. ILYA GRIGORIK: If I just copy
this and if I go into my text editor and just create a
new file here, so it's 5,000 lines of JSON. And this is basically all the
metadata that's inside of this network waterfall. PETER LUBBERS: So with that
data you could effectively reconstruct the network panel. We'll have an example
of that later. But also you can pull out
specific bits and pieces that you want. You want to see a specific-- the size of the request
or the headers. ILYA GRIGORIK: Actually, this
is worth pausing on. Because previously, I would
just take a snapshot or a screenshot, annotate it,
and send it over. Now, I can right click on this,
copy, and maybe if I'm making a bug report,
I can actually attach the entire trace. And then the developer has full
metadata about everything that's happened in
the browser. PETER LUBBERS: You could
still send me the screenshot as well. ILYA GRIGORIK: OK, sure. PETER LUBBERS: But we can
reconstruct the screenshot, effectively, with that data. So that's cool. ILYA GRIGORIK: So that's
interesting. That actually brings
up a good point. So OK, great, we took this
very nice visual representation. I copied it into a 5,000 line
JSON file which is very useful, but-- PETER LUBBERS: You've got
to deal with it now. ILYA GRIGORIK: Right. If I get this 5,000 line
JSON file, boy. PETER LUBBERS: You're
going to need some visualization of that data. And fortunately, there's
tons of tools that can help with this. Let's take a look at one. Probably the best one for that
at the moment is the HAR viewer that you've got here. It's an open source project. You can just open
it up on the-- ILYA GRIGORIK: So it
looks like actually I can run it myself. So we can just embed
it on any page. So it's PHP and JavaScript,
which is really cool. But they also have an online
demo, so we're going to try that. I actually have it
preopened here. So this is-- what do I do here? PETER LUBBERS: Basically,
remember how you copied the HAR information stuff? You could obviously-- actually, this supports
drag and drop as well. You could take the HAR file and
drag it onto this page. But for now, you have it in
the clipboard already. So go ahead and paste
it in there. ILYA GRIGORIK: So I'm going to
paste the entire 5,000 lines of JSON in here. PETER LUBBERS: Keep your
fingers crossed. And there we go. ILYA GRIGORIK: Wow. Look at that. PETER LUBBERS: Pretty
informative as well. ILYA GRIGORIK: So that
took no time. And look at this. Now, I have-- actually no, this is a
little bit different. I think that's something
else on my clipboard. PETER LUBBERS: Well,
let's do it again. ILYA GRIGORIK: Well, it probably
doesn't even matter. PETER LUBBERS: So now
you get the whole visualization of all that data. ILYA GRIGORIK: Now that I have
this JSON file, I can go to this thing, paste it in, and
basically reconstruct exactly what happened in your browser. And the nice thing is, I still
have all the fidelity of all of the header information,
requests, the actual timing data. That actually gives you a little
bit more than Chrome Developer Tools, things like
these pie charts, where was the time spent. You can see that within this
session, we fetched-- out of all the content that was
fetched, more than half was images, which is actually
pretty typical. So this is really nice. PETER LUBBERS: Then you can
compare it with other loads of your page or other sites and
see the trends over time. So actually, one thing I wanted
point out, if you can just jump back a moment to the
Chrome Developer Tools, one really cool thing that
I particularly like-- if you, for example,
refresh this page. And let's go ahead and
do that a moment. So refresh the page. And you notice how in the
network panel it will reload everything. Right now, it sort of wiped
out all of the previous information, and it's got the
new-- and that's typically what you want. But now if you want to
actually track-- navigating from page to page
to page, you can hit the Record button on the bottom. ILYA GRIGORIK: I've often
wondered what that thing was for. PETER LUBBERS: So click that. And then why don't you go to
the trends page or stats. ILYA GRIGORIK: I think I'm
on the trends now, so I'll go to Stats. PETER LUBBERS: Click on Stats. And now let's click
on websites. ILYA GRIGORIK: Sure. PETER LUBBERS: And why don't we
just go finish it off with the About page. So basically what's happening
is the information is not thrown away. It's just added on. ILYA GRIGORIK: Let me stop the
recording here for a second. And now look at this timeline. It actually says 50 seconds. It's not that any one of those
pages took 50 seconds. It's just that the timeline
is appended across all the different sessions. PETER LUBBERS: It's a complete
trace of what you're renewing. ILYA GRIGORIK: That's
really cool. So now I can actually do a thing
like come to the Home page, press Record, go to Login
page, go to the Checkout cart, and record that
entire trace. PETER LUBBERS: So now actually,
you can go ahead and do it in a moment. There's an interesting
thing at the viewer. ILYA GRIGORIK: OK, so
let me copy this. So this is probably like
50,000 lines of JSON. And go to the-- so let me hit Back. PETER LUBBERS: Back
to the Home page. ILYA GRIGORIK: Here's
our online demo. HAR Viewer online. Close some of these tabs. PETER LUBBERS: So we're going
to put it in there. And the nice thing is, if we hit
the Preview there, you'll see actually four pages. And you can actually in the top
part, even if just hover over them, you see a lot
of detail already. ILYA GRIGORIK: So then within
the session, we just recorded four different pages. And then for each page, we can
look at these pie charts. And we also have the full
waterfall chart for each one. And this is just spread over
the entire session. PETER LUBBERS: That's
pretty awesome. Also, what I like about it is,
you can actually have a site that has multiple pages. You can really see
the outliers. You can say, oh, all of them are
about the same, but this one is way out there. What's happening to that? You can zoom in, and you
have some context. ILYA GRIGORIK: So
you know what? Now that I see this, what I want
this a record button in Chrome, where I can actually
tell my users to just say, oh, you're having a problem? Hit Record. Go through a couple pages. Now stop and sent
me that trace. That would be really cool. PETER LUBBERS: That
would be cool. ILYA GRIGORIK: There's an
extension in the making right after this show. OK, that's really cool. So what's next on our agenda? PETER LUBBERS: We talked
about the viewer. Let's talk about some
other tools. And one of the nice things--
it's not an actual official standard or specification. I mean, there is a spec. But it's sort of the de-facto
standard, if you will, for a lot of this information. Not just Chrome, but Firefox
supports it. But there's a lot
of other tools. And so maybe you could just go
to the HAR adopter's page. ILYA GRIGORIK: Right, the "har"
HAR adopter's page. So it turns out Chrome supports
HAR, as we just saw. But it turns out that most other
tools that you guys are familiar with also have
a HAR export. In fact, the whole point for
creating the standard was originally when Firebug and
NetExport or HttpWatch were trying to figure out, how do
we come up with a single standard so we don't
end up creating different export files? So this is probably three
or four years back. So, obviously, Firebug and
HttpWatch support this. But if you look through this,
IE, Chrome Developer Tools, we just saw HAR Viewer, and there's
probably three dozen different tools in here. And so Charles Proxy is another
very popular tool. So this is actually an
important point. This is not just in a browser. So if I have a proxy running and
I configure my browser to go through it, I can still
capture all the same data, which is really nice. PETER LUBBERS: This will come
in very handy, for example, with mobile browsers. ILYA GRIGORIK: And we'll
actually see an example of that. And then I guess the other thing
to mention is the format itself is very flexible,
JSON, which is nice. Because most every language
out there today has a very good JSON library, because it
just maps very easily to your basic data types. But if you want, there are
wrappers in each language. So things like the Java HarLib,
where you just give it the HAR file, and it gives you
a nice object back which you can iterate over. There's Perl. There's Ruby. I found a whole bunch
of different tools. PETER LUBBERS: It's pretty
amazing the amount of support for this. Actually, just over the weekend,
we got a mail from Eric Duran from New York who
built this great little viewer, which I thought it was
awesome, because this is actually just a-- ILYA GRIGORIK: This looks
very familiar. This looks like Chrome
Developer Tools. PETER LUBBERS: It really does. He sort of re-implemented
all of that with drag-and-drop support. ILYA GRIGORIK: So this
is a page, right? PETER LUBBERS: Right. ILYA GRIGORIK: We're looking
at. github.com/chromeHAR. So drag and drop a HAR file. OK, so I happen to
have a HAR file. I've come prepared. PETER LUBBERS: Isn't
that awesome? ILYA GRIGORIK: Look at that. It's the same tools that
we know and love. I can click on this. You can see all the headers, the
cookies, the timing, and this is just a web app. So it's really cool. I think this is actually a good
demo of something that not many people realize, which
is Chrome Developer Tools is a web app. So I think what Eric is doing
here is he's actually pulling out a lot of the styling
and perhaps even the JavaScript logic. PETER LUBBERS: I think so. I haven't completely had enough
time to look at it yet, but it's pretty. ILYA GRIGORIK: That's
a pretty cool demo. So if you guys want, there's
a HAR Viewer. Now there's this. And it looks like he's even
thinking about adding some PHP functionality. PETER LUBBERS: We'll come back
to that one as well. ILYA GRIGORIK: Unfortunately,
I can't click on it right now, but-- PETER LUBBERS: So one of the
things about all of this is, of course, getting kind of
already a little bit tired of the manual parts of this. Let's focus on that a bit. Because if you have to actually
download this HAR file that's great. But if you have to do this more
often, like, for example, if you want to start seeing
trends over time, then it's going to be a little
bit problematic. You're going to see-- you're constantly doing
this manual work. So what tools are there, Ilya,
for automating this process? ILYA GRIGORIK: Well, short of
getting an army of monkeys, the other way to do it is-- so actually, there's a lot
of different ways. And I'm actually really excited
about this stuff. Let me show you some examples. So first of all, I don't
know if you guys are familiar with PhantomJS. PETER LUBBERS: That's
the headless router? ILYA GRIGORIK: Yeah, exactly. It's a headless WebKit
browser. And the cool thing about it--
well, there's so many cool things about it. We could probably do an
entire show on it. But one of things I like about
it is it's very easy to download and install. It's a statically compiled
binary, so you just download this zip file. You unpack it, and it's
basically a browser, a full-featured WebKit browser,
which means it can execute JavaScript. It'll download your CSS images, everything, which is nice. And you can run it from
the command line. So this is what I have here. I just downloaded
the actual file. I unpacked it. I just have it sitting here. Now I can do "bin phantom,"
and if you look at this-- so it just gives you a little
bit of help information. You can configure it
any way you like. But the way it basically works
is you need to give Phantom a script to tell it what to do. So think of it as this is a
browser, and we need to tell it, hey, I want you to open
a file or a URL and do something there. So It comes with a couple of
different example scripts, one of which is NetSniff. I think you can guess what
it's going to do. It's going to sniff on
the network traffic and log that data. Let me do this. I'll actually download
a page here. And if all goes well, this
should run Phantom. And look, it spit out what
looks like a JSON file. In fact, it looks
like a HAR file. Surprise! PETER LUBBERS: Very surprised. ILYA GRIGORIK: So let me
actually just to save that to an out file instead of just
printing it to screen. So we're looking at this. Once again, we have
all the JSON data. So what just happened? I think this is really
important. We have a fully-featured browser
that went in and captured all the network data. Now imagine if you take-- PETER LUBBERS: But you run
it on the command line. ILYA GRIGORIK: Right. So imagine now you take this,
and you have your CI build running on your-- PETER LUBBERS: Exactly. ILYA GRIGORIK: --instance. And after each check-in,
you run this and you captured a HAR file. Now you have the
full timeline. On every check-in, you have
a history of what happens. Maybe you added more resources
and all the rest. PETER LUBBERS: That actually is
interesting, because if you do that and you combine it
with running that through analysis tools like PageSpeed,
YSlow, is that already supported? ILYA GRIGORIK: You guys
can see that we practiced this yesterday. So here's another tool
that you'll love. So if you guys use Node,
there's actually-- YSlow is available
as a Node module. You can just run it. I already have it installed
on my machine. But if you run NPM, install
YSlow, that's what you're going to get. Let me just-- so once you install it, you just
get this YSlow binary. It gives you a couple different
examples here. I'm just going to copy
the guy right here. I think there's actually
a typo in the Readme. So we need two double dashes. And guess what it takes? PETER LUBBERS: A HAR file. ILYA GRIGORIK: It takes
the HAR file. PETER LUBBERS: It's
getting old. ILYA GRIGORIK: Right. So what we're going to
do is we're just going to print out-- actually, let me change
this for one second. We're going to print out
a basic summary. And format plain is
just plain text. And feed it this HAR file. And it analyzes the HAR file,
and says OK, there's 80 kilobytes of data that
we downloaded here. The score according to all of
the YSlow rules is 100. Hooray. And there's a total of seven
requests for this. So that's kind of interesting. Now let's dig in a little
bit deeper. Let's run all. And if you run all, all
of a sudden, you get a ton more output. So what happens here is, YSlow
has a set of rules or categories of rules for things
like, are you using a CDN? Are you setting expire
setters? Are you compressing your data
on all these things? As you can see, it's actually
scored that entire HAR file with respect to all
those rules. And it's giving you
a specific score. So it just so happens
that this page is actually fairly optimized. There's no offenders in the
sense that if there was a file that wasn't compressed, it
would show up in here. So now we have two things. We've captured the HAR file
from command line. We ran it through YSlow. And, of course, you don't
actually have to print this out-- PETER LUBBERS: All from
the command line. ILYA GRIGORIK: Right. And you don't have to print
this out in plain text. To feed into your own tool,
you have JSON output. So now, literally in two lines
of batch script, you can get access to all the performance
data. So now in every CI check-in,
I can run this and raise an alarm if all of a sudden-- PETER LUBBERS: Yeah, exactly. You can set alarms for when you
drop a certain score, or like, for example, images that
are not compressed or it thinks that it picks up the
typical YSlow rules. And then that can trigger
a whole chain of alarms and ways to fix it. That's awesome. ILYA GRIGORIK: Exactly. PETER LUBBERS: So is that also
supported in PageSpeed? ILYA GRIGORIK: It is. So it will require a little
bit more work. But if you go to the PageSpeed
site-- so I'm just looking at the PageSpeed SDK. You can actually download
the entire SDK. It's an open source project. And if you build the SDK,
you'll actually get this binary called-- guess what? HAR to PageSpeed. And you feed it a HAR file, and
you get a similar output, but perhaps slightly different
rules and slightly different waitings as YSlow. If you're a PageSpeed fan,
you can also use this. And by the way, PageSpeed SDK
also comes with some really awesome tools like the PageSpeed
Optimize Image. Nicely named, very
descriptive. So you just pass it an image
file, and it'll automatically pick out the right format and
optimize the stuff for you. That's something you
can also do as part of your build process. PETER LUBBERS: So I think when
I first started looking the HAR format and everything, I
really thought of it primarily from a browser perspective. And actually, after looking at
this more, obviously it's the browser making that HTTP request
to the server and the server responding. So when you start thinking about
it that way, it's like, OK, we can actually do
a lot more with it. We can actually go beyond the
browser, if you will, and start implementing HAR
on the server side. ILYA GRIGORIK: That's
a cool idea. PETER LUBBERS: I don't think
there's a whole lot of tools around that yet. But it's the same format. ILYA GRIGORIK: So just to
elaborate this a little bit, a lot of API servers or app
servers that you build kind of follow the same pattern. A request comes in then you have
a dispatch to a database or maybe another HTTP server
that does something. And it's the same waterfall,
basically. PETER LUBBERS: But
on the inside. ILYA GRIGORIK: So it would be
kind of cool if we could visualize the same data. And we saw a whole number of
different rules-- or tools, rather, that we can use for
this kind of stuff. PETER LUBBERS: So let's take a
look at a sample file that you put together already for this,
just to get the idea of what's happening here. ILYA GRIGORIK: So I've created
this mock file, and how that's created is a separate story. You can instrument your
own app server. Or you can actually use some
other tricks which we'll talk about in a second. PETER LUBBERS: Some
sort of logging. ILYA GRIGORIK: Exactly. So actually, I'm going to
sneak in another demo. So if you guys are working with
HAR files, and you have them on disk, there is this-- and you have Ruby installed,
which you probably do, you can actually do "gem install har,"
which creates this very, very handy utility. What you do is then you say
"har" and it installs this kind of dense script, and
you pass it a file. And check this out. So I'm going to run this. And it starts a local server,
which embeds the HAR viewer and fires up a new tab on your
thing and just visualizes it right here. How awesome was that? PETER LUBBERS: Wow. ILYA GRIGORIK: So we don't
even have to copy it, go to the site. You just kind of like feed it
a file and there you go. So next time somebody emails you
or you have a bug report with a HAR file, download it,
run it with this, you're up and running. So let's look at this. What's happening here? This is a simple request where
we're kind of simulating in let's say an RSS feed
application, where a request comes in for this feed ABC. And the next thing that happens
is we dispatch a search request for ABC, which
is basically like, what are the articles that
I should show? What are the article IDs? Let's say that returns a couple
different IDs, three. So that request takes some time,
136 milliseconds, and we have all of the header
information on all the other metadata from our API server. And then we, in parallel,
dispatch three requests to fetch the articles, and then
the response is returned. PETER LUBBERS: Yeah, you can
really trace right into that and see where you're taking
most of the time. ILYA GRIGORIK: So now you can
think about instrumenting your app server and reusing
the same tools. This is great visualization. PETER LUBBERS: Yeah, by sticking
to the format, you really get a lot of benefits. ILYA GRIGORIK: Yeah,
there's so many different tools out there. And speaking of tools-- so now
let's take this use case. So we've captured a HAR
file, let's say, on the command line. We can run it through YSlow,
which is very good for capturing, let's say,
regressions or anomalies if somebody checks in, So it'll ingest this data. But now it gets smarter, which
is to say, it looks at the URL which you were accessing, and it
says, hey, you've actually uploaded not one, but three
traces of this file, spread across time. So what I'm going to do is I'm
going to visualize the difference in page loading
time or total size. So we actually have--
as you can see here, we have three runs. I was playing with this
on the weekend. And between these three
different runs-- so the yellow line here is the
full load time. So the load time went up from
0.75 seconds to about 1 second between the three runs. This graph is also very
interactive, which I didn't realize for a while. For example, I can disable this
and load things like the onLoad time and time
to first byte. Imagine having this data for
your site across two weeks, and now you can go in and look
at, well, am I just putting more images? Or why did for my load
time get worse? And then you can also--- if you need a nice artifact for
your next presentation, you can also just save that
as a PNG image, or an SVG. So that's cool. That allows you to trend
it over time. And within here, for each
specific run, you, of course, still have access to all of the
metadata within each one. Domains from which we fetched
all the resources, it even embeds PageSpeed scoring. So it says, well,
for this page-- let's see, cache validator. So some of the resources didn't
have a cache validator. So it gave me an 83. OK, interesting. The HAR Viewer, as you would
expect, embedded in here, and you can toggle between all
the different runs. So here's a little tip. We've talked about those any
number of times now, but you've captured on CLI. You run it through YSlow,
and then you can push it into here. And all of a sudden, you have a performance monitoring solution-- PETER LUBBERS: Exactly. ILYA GRIGORIK: --in like three
lines of batch script. And if you guys are interested,
I actually found this example on the
HAR storage Wiki. I uploaded the file manually,
but you can just-- here's four lines of
Python where you just encode the data. You push it in. And that's it. You're done. PETER LUBBERS: We've looked
at it from the browser perspective now. We've looked at it from the
server side in the automation. And one thing we talked about
briefly is the mobile support. Obviously, you don't always have
Chrome Developer Tools on your mobile device. So how would you use some of
these tools to set that up? ILYA GRIGORIK: So I
actually made-- I cheated. I made us a little presentation there, two slides. You bring up a really good
point, which is if you're using Chrome on Android, you
actually have remote debugging, which is absolutely
awesome. Because you have access to the
same network panel plus all of the other-- even JavaScript debugging. But what happens when you're
running an older or another browser which doesn't have
that capability? PETER LUBBERS: And just for
those of you attending, we did do a show on-- the Chrome Mobile
show just last week on Chrome Mobile debugging. ILYA GRIGORIK: It's
an awesome tool. If you guys haven't used it
before, we definitely recommend it. But here we're talking about a
slightly different use case, which is let's say I have an
older phone or a phone with a browser that doesn't support
this kind of thing. Could I still get access
to this data? It's actually a little bit
tricky if you think about it. How do you do that? So there's a trick. I made a diagram just
to explain it. PETER LUBBERS: It involves
a proxy server. ILYA GRIGORIK: Yes, it involves
a proxy server. So here's the trick. You have your phone. And I will assume that
your phone can connect to a Wi-Fi hotspot. That's a requirement,
unfortunately. If you can't do that, then
you can't use this trick. But let's assume it does. So what we do is we take our
laptop, and we actually start a Wi-Fi hotspot on it,
so it becomes-- starts to broadcast. We then connect our phone
to the laptop. And now if I'm browsing on my
phone, I'm actually going through my laptop. So far, so good? So what's going to happen is
when I make a request in my mobile browser, it'll
go to my laptop. My laptop will go to the server
and just kind of funnel data back and forth. Now, given that the data is
flowing through the laptop, we can actually capture the data
with a low-level tool like a TCP dump, or Wireshark. PETER LUBBERS: Exactly. I was going to say with
Wireshark, you can capture everything. ILYA GRIGORIK: We can just say
capture on this interface, or this specific port, or this IP,
all that kind of stuff. So you run that capture, and
what you get out of it is a PCAP file, which is just
a very low-level-- here are the IP packets and TCP
packets that are flowing over the wire. So this is nothing that you
would consume without an additional tool. PETER LUBBERS: It's not
in a HAR format. ILYA GRIGORIK: No. Unfortunately not. I actually have a file that
I'll show you guys. I have a sample Wikipedia file
where I captured a PCAP file. If I just open this in
Vim, it's gibberish. Because it's a binary format,
not anything too interesting. Now it turns out that there is
actually a tool called PCAP to HAR, which will take those
IP packets and basically reconstruct-- or TCP packets, and reconstruct
the entire flow and create a HAR file. Actually, let me see if I
can find this tool here. You can use PCAP to HAR
to manually do this. So you capture the PCAP file. And then you get a HAR file out
of it, and then you can use the HAR Ruby gem
to visualize it. Or there's this web app which
allows you to just upload a PCAP file. And it'll do the encoding,
and it will just show you the thing. Let me show you this. So I have my Wikipedia
HAR file here. PETER LUBBERS: No,
it's a PCAP file. ILYA GRIGORIK: Sorry,
PCAP file. And I'm just going
to hit Upload. And so we're doing is we're
uploading the raw binary data. It's going to run the
transform to HAR. And look at that. Now we're looking at are the
waterfall chart as captured from a mobile phone that perhaps
you couldn't configure a proxy server on it or using
some browser that doesn't support remote debugging. PETER LUBBERS: Can you
grab the HAR-- the raw data on the
other tab there? The HAR JSON, yeah, OK. ILYA GRIGORIK: You can also
explore it, or you can click on the Download HAR File,
and there you go. PETER LUBBERS: Full circle. ILYA GRIGORIK: So this tool-- and we're only scratching
the surface, here. We just spent what? 40 minutes talking about
all the different ways we can use this. But I think we've only scratched
the surface. Because we can use
it on the server. We can use it to automate
performance monitoring. I think instrumentation
is key. So having tools like-- combining
tools like HAR Storage, HAR Viewer and others,
you can literally build a performance monitoring
dashboard for your site or for your company in a
couple of hours. PETER LUBBERS: Yeah, because
there's so many tools around it. So let's see if we have
any other questions. Oh, first of all, the links. ILYA GRIGORIK: We covered
a lot of stuff in here. So I actually created a quick
gist of a whole bunch of different links for some of
the tools that we covered. I tried to capture
them all here. So if you guys go to
bit.ly/har-show, you'll find all the links in there. And we'll actually also push
out a blog post later today with a little bit more
information on it. PETER LUBBERS: Put it on their
Chrome Developers, Google+ page and link to it. ILYA GRIGORIK: But I would
definitely encourage you guys to just explore it, play
with the HAR viewer. Definitely try Phantom and see
what you can do with it. Actually, one quick
note on Phantom. When you download the file, it
comes with example files. I showed you the NetSniff. There's actually a small
bug in the NetSniff file, which I fixed. And you guys should go to the
GitHub page for Phantom and just copy the latest
NetSniff file. So it's there, it's fixed. It's just not on the latest
release, so just FYI. PETER LUBBERS: We'll make
a note of it on here. Let's take a look if there's
any questions on the-- OK, there's a couple. ILYA GRIGORIK: So we got a
question from Steve Sauders. Wow, I think I know that guy. "Does a HAR file contain
the response bodies?" So by default, when you export
out of, let's say, Chrome Developer Tools, it does not
contain the response body. But there is no reason
why it can't. So if you're writing your own
tool, so for example, if you're scripting Phantom or
something like that, no reason why you can't include it. And I think would be really
cool if, for example, HTTP Archive actually stored
the bodies-- sorry, httparchive.org,
the site. And then you could do
more interesting analysis over time. PETER LUBBERS: Sure. They would require a lot
more storage space. ILYA GRIGORIK: Yes,
absolutely. PETER LUBBERS: I think that's
why it's not there by default. ILYA GRIGORIK: Yes. Well, exactly. PETER LUBBERS: "What about
compressing HAR files?" Interesting. ILYA GRIGORIK:Well, yeah, as we
saw, we exported one trace, 5,000 lines of JSON. If you're storing these things
over time, you probably want to compress them, and they will
compress incredibly well. Coming out of Chrome Dev Tools
or other tools, you're just going to get the raw
file, the raw JSON. It's up to you if you want to
store it, to archive it-- or, sorry, to compress it. PETER LUBBERS: So "can the HAR
format be extended to support information about requests
that don't make it to the server, but instead hit the
cache?" Interesting. ILYA GRIGORIK: Actually, if
you capture the-- if you export out of Chrome
Dev Tools, it will contain that data. If you load your developer
tools, it does show requests that are coming from
the cache. It actually indicates that. And it will be there in
the exported HAR data. And the reason they-- usually
the quick way to spot it is when you look at those chunks
of JSON, you will see that some requests don't have
any HTTP headers. That's a giveaway that this
came out of the cache. PETER LUBBERS: A couple more. ILYA GRIGORIK: So "BrowserMob
proxy will capture and produce a HAR file. And Charles can also be used as
a proxy to generate HAR." right, so I think this is more
of a note from Andy, which is a really good point, which is we
talked about exporting this out of a browser. But you can use a tool like
Charles Proxy, or Fiddler, or something else where it's
a standalone app. So I could actually run a proxy
server on my laptop, and then connect from this laptop
here to just proxy everything through it, and capture this
data, capture the HAR files on this device. So that's a really good point. PETER LUBBERS: OK, one more. "When you were looking at the
HAR viewer, a JPEG appeared as the first request rather
than the HTML page. Why does this happen?" ILYA GRIGORIK: That's
a good question. That could be a bug. But every request that is made
has a timestamp within the actual export. So it'll say when the request
started, which is how we determine in Dev Tools
where in the timeline it should live. So I'm not sure if the HAR spec
specifically says that it must be sorted in order. Maybe it does, maybe
it doesn't. But you can easily resort that
data based on the timestamp and get the exact timeline. Cool. So I think that that
covers it. PETER LUBBERS: That was great. ILYA GRIGORIK: This is
definitely a power tool. Lots and lots of stuff that
you can do with it. And I think, once again,
instrumentation is key for anything to do with
performance. PETER LUBBERS: And all these
tools, there's so much support for it that this is really the
way to go for all of this-- ILYA GRIGORIK: Yeah, so I
definitely encourage you guys to play with it. PETER LUBBERS: So we'll
post the links. We'll post it on the Chrome
Developer Google+ page. You'll be posting your
blog pretty soon? ILYA GRIGORIK: I'll have a blog
post up soon as well, just documenting some of
the examples here. And then maybe just
one quick note. I think we're done with this. But in our next episode, we're
actually going to take a look at Google web fonts, which
is going to be a really fun topic. I love web fonts. A lot of people have issues with
web fonts when it comes to performance. So we're going to do a deep dive
on what it takes to make web fonts fast and what Google
Web Fonts specifically does to make web fonts fast. PETER LUBBERS: And that's
in two weeks? Same time, right? ILYA GRIGORIK: Yep, two weeks. So yeah, Tuesday. PETER LUBBERS: We'll
announce it. Excellent. ILYA GRIGORIK: Awesome. Thank you guys. PETER LUBBERS: Thanks a lot. [MUSIC PLAYING]