Solving SEO with Headless Chrome (Polymer Summit 2017)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

SAM LI: Hi, everyone. I'm Sam Li, and I'm an engineer on the Polymer team. If you managed to pick up on my accent in the last five words, I am indeed Australian, and so honored to be followed up by Trey, a fellow Aussie, as well. Prior to joining this team, I'd worked on the beloved Chrome DevTools. One of my smallest, but maybe my greatest contribution was adding the ability to rearrange tabs in DevTools. [APPLAUSE] It's probably the greatest five lines I've ever written. I did work on other features. So if you find me afterwards, feel free to ask me about them. I might share a DevTools trick or two. More recently I've had the humbling experience of building webcomponents.org and witnessing all the incredible components that all of you have built and published. For example, the one and only Pokemon selector. And if you are the person who says, but there's only 151 Pokemon in the original set, well, there's even an option that lets you set that, too. So all kudos to Sami for this. He was, however, in the process of building webcomponents.org, which brings us to what we're here to talk about today. So first, I'm going to cover my story of how I came to encounter this SEO problem while building webcomponents.org. We'll then look at how I used headless Chrome to solve this before diving into all the details of how that actually works and how you can use it. So I'm going to take a step back for a moment and talk about what I learned in the process of building webcomponents.org. The first thing I learned was how the platform supports encapsulation through the use of web components. With this encapsulation comes inherent code reuse, which leads to a specific architecture. Also learned a lot about progressive web apps and how they can provide us with fast, engaging experiences. I learned how the platform provides APIs, such as service workers, to help enable those experiences. I also learned how to compose web components to build a progressive web app. We've heard from Kevin yesterday about the PRPL pattern-- Push Render, Pre-cache, Lazy Load-- as a method of optimizing delivery of this application to the user. And one of the architectures which enables us to utilize the PRPL pattern is the App Shell model. It provides us with instant reliable performance by using an aggressively cached App Shell. You can see that of all the requests which hit our server, we serve the entry point file which we serve regardless of the route. The client then requests the App Shell, which is similar. But because it's the same URL across the application, we can combine that with a service worker to achieve near-instant loading on repeated visits. The shell is then responsible for looking at the actual route that was requested and then requests the necessary resources to render that route. So at this point I'd learned how to build a progressive web app using client-side technologies like Web Components and Polymer, and how to use patterns such as the PRPL pattern to deliver this application quickly to the user. Then there's the elephant in the room, SEO. For some of these bots, they're basically just running curl with that URL and stop right there. No rendering, no JavaScript. So what are we left with? With this PWA that we built using the App Shell model, we're left with just your entry point file, which has no information in it at all. And in fact, it's the same generic entry point file that you serve across your entire application. So this is particularly problematic for Web Components which require JavaScript to be executed for them to be useful. This issue applies to all search engine indexes that don't render JavaScript. But it also applies to the plethora of link rendering bots out there. There's the social bots like Facebook and Twitter, but don't forget the enormous number of link rendering bots such as Slack, Hangouts, Gmail, you name it. So what is it about the App Shell model that I'd really like to keep? Well, for me, this approach pushes our application complexity out to the client. You can see that the server has no understanding of routes. It just serves the entry point file, and it has no real understanding of what the user is actually trying to achieve. This allows our server to be significantly decoupled from the front end application, since it now only needs to expose a simple API to read and manipulate data. The application that we pushed out to the client is then responsible for servicing this data to the user and mediating user interactions to manipulate this data. So I asked, can we keep the simple architecture that we know and we love and also solve this SEO use case with zero performance cost? So then we thought, what if we just use headless Chrome to render on our behalf? So here's a breakdown of how that would work. We have our regular users who are making a request, and they would like a cat picture. Because who wouldn't? And as part of this approach we ask, are you a robot? And to answer this, we look at the user agent string and check if it's a known bot that doesn't render. In this case, the user can render, so we serve the page as we normally would. The server responds with a fetch cat picture function, and then the client can go and execute that function to get the rendered result. By the way, this is one of my kittens, which I fostered recently. She's super adorable. Now when we encounter a bot, we can look at the user-agent string and determine that they don't render. And instead of serving that fetch cat picture function, we fire for a quest to headless Chrome to render this page on our behalf. And then we send the serialized rendered response back to the bot so they can see the full contents of the page. So I built a proof of concept of this approach to webcomponents.org, and it worked. I wrote a "Medium" post about it, and people were really interested in this approach and wanted to see more of it. So based on this response, I eventually decided that instead of my hacky solution that I would build it properly. But then came the most challenging part of any project. And I know you've all experienced it as well. Naming. So I asked in our team chat for some suggestions, and I got a ton. [LAUGHTER] So these are some of our top ones. There's some great ones in there. Power Renders, Use The Platform As A Renderer. However, today I am very pleased to introduce Rendertron. Let me render that for you. [APPLAUSE] Rendertron is a Dockerized headless Chrome rendering solution. So that's a mouthful, so let's break it down. First off, what is Docker and why did I use it? Well, no one knows what it means, but it's provocative. In all seriousness, Docker containers allow you to create lightweight images as standalone executable packages which isolate software from its surrounding environment. In Rendertron, we have headless Chrome packaged up in this container so that you can easily clone and deploy these to wherever you like. So what about headless Chrome? It was introduced in Chrome 59 for Linux and Mac, Chrome 60 for Windows, and it allows Chrome to run in environments which don't have a UI interface, such as a server. This means that you can now use Chrome as part of any part of your tool chain. You can use it for automated testing, you can use it for measuring the performance of your application, generating PDFs, amongst many other things. Headless Chrome itself exposes a really basic JSON API for managing tabs, with most of the power coming from the DevTools protocol. All of DevTools is built on top of this protocol, so it's a pretty powerful API. And one of the key reasons that headless Chrome is great is that now we're bringing the latest and greatest from Chrome to ensure that all the latest web platform features are supported. With Rendertron, this means that your SEO can now be a first class environment which is no different from the rest of your users. So just a quick shout-out. This all sounds really interesting to you, and you'd like to include headless Chrome in some other way in your tool chain. There's a brand-new node library that was published just last week that exposes a high-level API to control Chrome while also bundling all of Chrome inside that node package. So you can check it out on gitHub at GoogleChrome/puppeteer. So I've looked at the high level of how headless Chrome can fit into your application to fulfill your SEO needs. Now it's time to dive to how it works. But I've been talking a lot. So who wants to see Rendertron in action? [CHEERS] All right, so this is the Hacker News PWA created by some of my awesome colleagues, and it's built using Polymer and Web Components. It loads really fast, and all around performs pretty well. We can see the separate network requests which loads the main content that we see. And we can guess that it's affected by this SEO problem, since it uses Web Components which require JavaScript, and it pulls in data asynchronously. So one quick way to verify this is by disabling JavaScript and refreshing the page. And once we do that, we can see that we still get the app header, since that was in the initial request, but we lose the main content of the page, which isn't good. So we jump over to Rendertron, a headless Chrome service that is meant to render and serialize this for you. So I wrote this UI as a quick way to put in the URL and test the applet from Rendertron. So first off, what are we hoping to see? Because these bots only perform one request, we want to see that whole page come back in that one network request. We also want to see that it doesn't need any JavaScript to do this. So take a look. I'm going to put in the Hacker News URL and tell Rendertron to render and serialize this, and that I'm also using Web Components. And it renders correctly. I'm going to disable JavaScript and verify that it still works. So you can see it's still there, and it all comes back in that single network request. Rendertron automatically detects when your PWA has completed loading. It looks at the page load event and it shows that it has fired. But we know that's a really poor indication of when the page is actually completed loading. So Rendertron also ensures that any async work has been completed, and it also looks at your network requests to make sure they're finished as well. In total, you have a 10-second rendering budget. This doesn't mean that it waits 10 seconds, though. It'll finish as soon as your rendering is complete. If this is insufficient for you, you can also fire a custom event which signals to Rendertron that your PWA has completed loading. Serializing Web Components is tricky because of Shadow DOM, which abstracts away part of the DOM tree. So to keep things simple, Rendertron Shady DOM, which polyfills Shadow DOM. This allows Rendertron to effectively serialize the DOM tree so that it can be preserved in the output. So let's take a look at the news PWA, which we've all seen, and it's also built by some of our other colleagues. And we'll plug that into Rendertron. We'll then ask Rendertron to render this as well, and then I'm also using Web Components. And there we have it. So what do you need to do to enable this behavior? With Polymer 1 this is super easy, and Rendertron doesn't actually need to do anything. Simply append dom equals shady to the URLs that you pass to Rendertron and Polymer 1 will ensure that Shady DOM is used. With Polymer 2, and with Web Components v1, it's recommended you use Web Components loader.js which pulls in all the right polyfills on different browsers. You then set a flag to Rendertron telling it that you're using web components, and it will ensure that the necessary polyfills that it needs for serialization get enabled. So another feature of Rendertron is that it lets you set HTTP status codes. These status codes are used by indexes as important signals. For example, if it comes across a 404, it's not going to link to that page because that would be a really poor search result. Our server, though, is still returning that entry point file with the status card of 200 OK. So it looks like every URL exists. Rendertron lets you configure that status code from within your PWA, which understands when a page is invalid. Simply add meta tags-- dynamically is fine-- to signal to Rendertron what the status code should be. Rendertron will then pick these up and return that status code to the bot. So this approach isn't specific to Polymer or even Web Components. Let's plug in fonts.google.com and see what happens when we serialize it. So that looks pretty good. Who can guess what JavaScript library was used to build Google Fonts? Angular. Rendertron works with any and all client-side technologies that work in Chrome and whose DOM tree can be serialized. The Rendertron endpoint also features screenshot capabilities so that you can check that headless Chrome and the load-detecting function are performing as you expect. Unfortunately, this service is not fast. For each URL that we render, we spin up headless Chrome to render that entire page. So performance is strictly tied to the performance of your PWA. Rendertron does, however, implement a perfect cache. This means that if we have rendered the same page within a certain cache freshness threshold, we'll serve the cached response instead of re-rendering it again. So how can you get your hands on this today, and how do you use it? Well first, you'll need to deploy the Rendertron service to an endpoint. You'll need to clone the gitHub repo at GoogleChrome/rendertron. And it's built primarily for Google Cloud, so it's easiest to deploy there. But if you remember, this is a Docker container. So you can deploy this to anywhere which supports a Docker image. So to make things simple for you to test out, we have the demo service endpoint, which you can hit at render-tron.appspot.com. And that's the one with the UI that we saw earlier. It is not intended to be used as a production endpoint. However, you are welcome to use it, but we make no guarantees on uptime. Having this as a ready to use service is something that we might consider based on the interest received. So just in case you're wondering, my boss's Twitter handle is @mattsmcnulty, just in case you want to tell him how awesome I am. So once we have that end point up, you're going to need to install some middleware in your application to do the user-agent splitting that I was talking about earlier. So this middleware needs to look at the user-agent, figure out whether or not they can render, and if not, proxy the request through the Rendertron end point. If you're using prpl-server, which is a node server designed to serve production applications using PRPL, you simply need to specify the bot proxy option and provide it with your Rendertron endpoint. If you're using Express, there's a middleware that you can include directly by saying app.use rendertron-middleware with a proxy endpoint and whether or not you're using Web Components. If you're not using either of these, check the docs for a list of community-maintained middleware. There's a Firebase function there, as well as a list of existing middleware that Rendertron is compatible with. If it's not listed, it's also fairly simple to roll your own middleware by simply proxying based on the user-agent string. And that's it. That's all the changes you need to make to use Rendertron today, and all these bots can now be happy. Rendertron is available to use today, compatible with any client side technologies, including both Polymer 1 and Polymer 2. Thank you.

Info

Channel: Google Chrome Developers

Views: 14,065

Rating: undefined out of 5

Keywords: Polymer Summit 2017, Polymer Summit, Google Polymer summit, polymer js, Polymer, Polymer Project, Polymer Summit Copenhagen, Web Components, Polymer Library, polymer tutorials, Polymer 3.0, web development, google chrome, #PolymerSummit, Use the platform, #usetheplatform, Chrome, developers, developer news, google event, google developer conference, web, html, Sam Li, headless chrome, SEO

Id: ydThUDlBDfc

Channel Id: undefined

Length: 18min 49sec (1129 seconds)

Published: Wed Aug 23 2017