Deliver search-friendly JavaScript-powered websites (Google I/O '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] TOM GREENAWAY: Good morning, everyone. My name is Tom Greenaway, and I'm a partner developer advocate from Google Sydney with a focus on the indexability of progressive web applications. JOHN MUELLER: Hi, everyone. I'm John Mueller. I'm a webmaster trends analyst from Zurich in Switzerland. It's great to see so many of you here, even at this early hour. TOM GREENAWAY: Now, as you can imagine, John and I have a lot of experience with the work web developers must do to ensure that websites are indexable, which is another way of saying whether a web page can be found and understood by search engines. But do search engines see all web pages exactly the same way? Are some pages more complex than others, and what about modern JavaScript powered web sites? Today, we'll be taking a closer look into what it takes for a modern JavaScript powered website to be properly indexed by search crawlers, and especially Google Search. And I'm excited to tell you that in this talk, we're announcing a bunch of cool new stuff, including a new change to Google Search policy, a new approach for rendering HTML 2 search crawlers, and even a new Google Search console tool. It sounds like a lot of stuff, right? Well, that's because it is, so let's get started. Now, a long time ago, before I joined Google, I was building e-commerce sites, and I personally felt there was a lot of mystery at times behind Google Search, especially on the topic of indexability. I would wonder, why do some of my pages appear in Google Search, and some don't? And what's the difference between them? Will JavaScript be rendered correctly? Will JavaScript rendered content appear properly and be indexed? And is lazy loading an image safe to do? These are really critical questions, and as developers ourselves, we understand the frustration behind this mystery. So today, John and I are going to do something we very rarely do at Google. We're going to pull back the curtain a little bit and reveal some new pieces of information about how Google Search sees the web and indexes it. And with this knowledge and a few new tools, you'll have concrete steps you can take to ensure the JavaScript powered websites you're building are visible to Google Search. Now, I want to remind you that this talk is about modern JavaScript powered websites, and typically, these websites will be powered by a JavaScript framework, such as Angular, Polymer, React, or Vue.js. And who doesn't love a great web development framework that's easy to use and helps you build your sites faster and works great for your users? But it's important to recognize that some of these frameworks use a single page app configuration model, meaning they use a single HTML file that pulls in a bunch of JavaScript. And that can make a lot of stuff simpler, but if you don't watch out, JavaScript powered websites can be a problem for search engines. So let's take a quick look at what the default template for a new Angular project looks like. As you can see, the default project template is pretty basic. It shows you how to use Angular to render a header, an image, and a few links. Nice and simple. How could this possibly be a problem from an indexability perspective? Well, let's take a peek behind the scenes at the HTML. This is it. Take a good look. When viewed in the browser, the default sample project had text, imagery, and links, but you wouldn't know that from looking at this initial HTML that's been delivered from the server, now, would you? The initial HTML that's been sent down is actually completely devoid of any content. See here in the app root-- that's all there is in the body of the page, except for some script tags. So some search engines might assume that there's actually nothing here to index, and to be clear, Angular isn't the only web framework that serves an empty response on its initial server side render. Polymer, React, and Vue.js have similar issues by default. So what does this mean for the indexability of our websites from the perspective of Google Search? Well, to answer that question better, we'll take a little step back and talk about the web in general, why search engines exists, and why search crawlers are necessary. Perhaps a good question to start with is, how big is the web? Well, we can tell you that we've actually found over 130 trillion documents on the web. So in other words, it's really big. And as you know, the aim of all search engines, including Google, is to provide a list of relevant search results based on a user's search query. And to make that mapping of user queries to search results fast and accurate, we need an index similar to the catalog of a gigantic library. And given the size of the web, that's a really complex task. And so to build this index to power our search engine, we need another tool-- a search crawler. And traditionally, a search crawler was basically just a computer and a piece of software that performed two key steps. One, it aims to find a piece of content to be crawled, and to do this, the content must be retrievable via URL. And once we have a URL, we get its content, and we sift through the HTML to index the page and find new links to crawl, as well. And thus, the cycle repeats. So let's look at that first step, the crawling, and break it down. Oh, and yes, as an Australian, I felt it was imperative that I include some spiders in my talk. So this is the cutest possible one I could find. John, what do you think? No, you're not convinced? OK, well, I have a few more in the deck, so maybe you'll come around. So to ensure the crawling is possible, there are some key things to keep in mind. Firstly, we need URLs to be reachable, as in, there shouldn't be any issue when the crawler wants to request the web pages and retrieve the resources necessary for indexing them from your web server. And secondly, if there are multiple documents that contain the same content, we need a way to identify the original source. Otherwise, it could be interpreted as duplicate content. And finally, we also want our web pages to have clean, unique URLs. Originally, this was pretty straightforward on the web, but then the first single page apps made things a bit more complicated. So let's go to each of these concepts. First, for the reachability of URLs, there's a simple, standard way to help search engines find content that you're probably familiar with. You add a plain text file called robots.text to the top level domain of your site, which specifies which URLs to crawl and which to ignore. And I say URLs, because these rules can prevent JavaScript from being crawled, too, which could affect your indexability. And this example also gives us a link to a sitemap. A sitemap helps crawlers by providing a recommended set of URLs to crawl initially for a site. And to be clear, there's no guarantee these URLs will get crawled. They're just one of the signals that search crawlers will consider. OK, but now, let's talk about that duplicate content scenario, and how search crawlers deal with this situation. Sometimes, websites want multiple pages to have the same content, right? Even if it's a different website. For example, bloggers will publish articles on their website and cross-post to services like Medium to increase the reach of their content, and this is called content syndication. But it's important for search crawlers to understand which URL you prefer to have indexed. So the canonical metadata syntax shown here in the HTML allows the duplicate documents to communicate to crawlers where the original, authoritative source for the content lives. We call that source document the canonical document. And traditionally, URLs for the web started out quite simple-- just a URL that was fetched from a server with some HTML. But then, of course, AJAX came along and just changed everything. Suddenly, websites could execute JavaScript, which could fetch new content from the server without reloading the browser page. But developers still wanted a way to support back and forth browser navigation and history, as well. So a trick was invented, which leveraged something called the fragment identified, and its purpose is for deep linking into the sub-content of a page, like a subsection of an encyclopedia article. And because fragment identifiers were supported by browsers for history and navigation, this meant developers could trick the browser into fetching new content dynamically, without reloading the browser page, and yet also support the history and the navigation we love about the web. But we realized that using the fragment identifier for two purposes-- subsections on pages, and also deep linking into content-- it wasn't very elegant. So we moved away from that. And instead, another approach was proposed-- to use the fragment identifier, followed by an exclamation mark, which we call the hashbang. And this way, we could discern the difference between a traditional URL using the fragment identifier for the sub-content on a page, versus a fragment identifier being used by JavaScript to deep link into a page. And this technique was recommended for a while. However, nowadays, there is a modern JavaScript API that makes these old techniques less necessary, and it's called the History API. And it's great, because it enables managing the history state of the URL without requiring complete reloads of the browser all through JavaScript. So we get the best of both worlds-- dynamically fetched content with clean, traditional URLs. And I can tell you that from Google's perspective, we no longer index that single hash work around, and we discourage the use of the hashbang trick, as well. OK, well, that's crawling out of the way. Now, let's move on to the indexing step. So web crawlers ideally want to be able to find all the content on your website. If the crawlers can't see some content, then how are they going to index it? And the core content of the page includes all the text, imagery, video, and even hidden elements, like structured metadata. In other words, it's the HTML of the page. But don't forget about that content you dynamically fetched, either. This could be worth indexing, as well, such as Facebook or discuss comments. Crawlers want to see this embedded content, too. And also, this might seem really obvious, but I want to emphasize that at Google, we take HTTP codes pretty seriously, especially 404 not found codes. If crawlers find a page that has a 404 status code, then they probably won't even bother indexing it. And lastly, of course, a crawler wants to find all the links on a page, as well, because these links allow the crawlers to crawl further. So now, let's just talk a bit about those links quickly, because honestly, they're some of the most important parts of the web. How do search crawlers like Google find links? Well, I can't speak for all search crawlers, but I can say that at Google, we only analyze one thing-- anchor tags with HREF attributes, and that's it. For example, this span here that I've just added-- it won't get crawled, because it's not an anchor. And this additional span I've added, even though it's an anchor, it doesn't have an HREF attribute. But if you are using JavaScript, such as with the History API that I mentioned earlier, to navigate the page purely on the client and fetching new content dynamically, you can do that, so long as you use the anchor tags with HREF attributes like in this last example. Because most search crawlers, including Google, will not simulate navigation of a page to find links. Only the anchor tags will be followed for linking. But wait-- is that really everything? In order to have sifted through the HTML to index the page, we needed to have the HTML in the first place. And in the early days of the web, the server likely gave us all the HTML that was necessary. But nowadays, that's not really the case. So let's insert a step between crawling and indexing, because we need to recognize that the search crawlers, themselves, might need to take on this rendering task, as well. Otherwise, how will the search crawler understand the modern JavaScript powered websites we're building? Because these sites are rendering their HTML in the browser itself, using JavaScript and templating frameworks, just like that Angular sample I showed you earlier. So when I say rendering, I don't mean drawing pixels to the screen. I'm talking about the actual construction of the HTML itself. And ultimately, this can only ever happen on either the server or on the client, or a combination of the two could be used, and we call that hybrid rendering. Now, if it's all pre-rendered on the server, then a search engine could just index that HTML immediately. But if it's rendered on the client, then things get a little bit trickier, right? And so that's going to be the challenge that we'll be discussing today. But one last turn-- you might be wondering, what is Google Search's crawler called? Well, we call it Googlebot, and we'll be referring to it a lot in this tour. And I think another detail to note is that I said that a search crawler is basically just a computer with some software running on it. Well, obviously, maybe in the '90s, that was the case. But nowadays, due to just the sheer size of the web, Googlebot is comprised of thousands of machines running all this distributed software that's constantly crunching data to understand all of this continuously expanding information on the web. And to be honest, I think we sometimes take for granted just how incredible Google Search really is. For example, I recently learned that with the Knowledge Graph, which is a database of all the information we have on the web, it actually maps out how more than 1 billion things in the real world are connected and over 70 billion facts between them. It's kind of amazing. OK. Well, now that we know the principles of a search crawler, let's see how these three different key steps-- crawling, rendering, and indexing-- all connect. Because one crucial thing to understand is the cycle of how Googlebot works or how it should ideally work. As you can see, we want these three steps to hand over to one another instantly. And as soon as the content is fully rendered, we want to index it to keep the Google Search index as fresh as possible. This sounds simple, right? Well, it would be if all the content was rendered on the server and complete when we crawl it. But as you know, if a site uses client side rendering, then that's not going to be the case, just like that Angular sample I showed you earlier. So what does Googlebot do in this situation? Well, Googlebot includes its own renderer, which is able to run when it encounters pages with JavaScript. But rendering pages at the scale of the web requires a lot of time and computational resources. And make no mistake-- this is a serious challenge for such crawlers, Googlebot included. And so we come to the important truth about Google Search we would like to share with you today, which is that currently, the rendering of JavaScript powered websites in Google Search is actually deferred until Googlebot has the resources available to process that content. Now, you might be thinking, OK. Well, what does that really mean? Well, I'll show you. In reality, Googlebot's process looks a bit different. We crawl a page, we fetch the server side rendered content, and then we run some initial indexing on that document. But rendering the JavaScript powered web pages takes processing power and memory, and while Googlebot is very, very powerful, it doesn't have infinite resources. So if the page has JavaScript in it, the rendering is actually deferred until we have the resources ready to render the client side content. And then we index the content further. So Googlebot might index a page before rendering is complete, and the final render can actually arrive several days later. And when that final render does arrive, then we perform another way of indexing on that client side rendered content. And this effectively means that if your site is using a heavy amount of client side JavaScript for rendering, you could be tripped up at times when your content is being indexed due to the nature of this two-phase indexing process. And so ultimately, what I'm really trying to say is because Googlebot actually runs two waves of indexing across your content, it's possible some details might be missed. For example, if your site is a Progressive Web Application, and you've built it around the single page app model, then it's likely all your unique URLs share some base template of resources, which are then filled in with content by AJAX or fetch requests. And if that's the case, consider this-- did the initially server side rendered version of the page have the correct canonical URL included in it? Because if you're relying on that to be rendered by the client, then we'll actually completely miss it, because that second wave of indexing doesn't check for the canonical tag at all. Additionally, if the user requested a URL that doesn't exist, and you attempt to use JavaScript to send the user a 404 page, then we're actually going to miss that, too. Now, John will talk more about these issues later in the talk, but the important thing to take away right now is that these really aren't minor issues. These are real issues that could affect your indexability, metadata, canonical tags, HTTP codes. As I mentioned at the beginning of this talk, these are all really key to how search crawlers understand the content on your web pages. However, just to be clear, not all web pages on a website necessarily need to be indexed. For example, actually on the Google I/O schedule website, there is a listing and filter interface for the sessions, and we want search crawlers to find the individual session pages. But we discovered the client side rendered deep links weren't being indexed because the canonical tags were rendered in the client, and the URLs were fragment identifier based. So we implemented a new template with clean URLs and server side rendered canonical tags to ensure their session descriptions were properly indexed, because we care about that content. And to ensure these documents were crawlable, we added them to the site map, as well. But what about the single page app, which allows for filtering sessions? Well, that's more of a tool than a piece of content, right? Therefore, it's not as important to index the HTML on that page. So ask yourself this-- do the pages I care about from the perspective of content and indexing use client side rendering, anyway? OK. So now you know-- when building a client side rendered website, you must tread carefully. As the web and the industry has gotten bigger, so, too, have the teams and companies become more complex. We now work in a world where the people building websites aren't necessarily the same people promoting or marketing those websites. And so this challenge is one that we're all facing together, as an industry, both from Google's perspective and yours, as developers, because after all, you want your content indexed by search engines, and so do we. Well, this seems like a good opportunity to change tracks. So John, do you want to take over and tell everyone about the Google Search policy changes and some of the best practices they can apply so we can meet this challenge together? JOHN MUELLER: Sure. Thanks, Tom. That was a great summary of how Search works. Though, I still don't know about those pictures of spiders. Kind of scary. But Googlebot, in reality, is actually quite friendly. Anyway, as Tom mentioned, the indexing of modern JavaScript powered websites is a challenge. It's a challenge both for Google, as a search engine, and for you all, as developers of the modern web. And while developments on our side are still ongoing, we'd like to help you to tackle this challenge in a more systematic way. So for that, we'll look at three things here-- the policy change that we mentioned briefly before, some new tools that are available to help you diagnose these issues a little bit better, and lastly, a bunch of best practices to help you to make better JavaScript powered websites that also work well in Search. So we've talked about client side rendering briefly and server side rendering already. Client side rendering is a traditional state where JavaScript is processed on the client-- that would be the user's browser-- or on a search engine. For server side rendering, the server-- so your server will process the JavaScript and serve mostly static HTML to search engines. Often, this also has speed advantages. So especially on lower end devices, on mobile devices, JavaScript can take a bit of time to run. So this is a good practice. For both of these, we index the state as ultimately seen in the browser. So that's what we pick up, and we try to render pages when we need to do that. There's a third type of rendering that we've talked about in the past. It starts in the same way in that pre-rendered HTML is sent to the client. So you have the same speed advantages there. However, on interaction or after the initial page load, the server adds JavaScript on top of that. And as with server side rendering, our job, as a search engine, is pretty easy here. We just pick up the pre-rendered HTML content. We call this hybrid rendering. This is actually our long-term recommendation. We think this is probably where things will end up in the long run. However, in practice, implementing this can still be a bit tricky, and most frameworks don't make this easy. A quick call out to Angular, since we featured them in the beginning as an example of a page that was hard to pick up. They have built a hybrid rendering mode with Angular Universal that helps you to do this a little bit easier. Over time, I imagine more frameworks will have something similar to make it easier for you to do this in practice. However, at least at the moment, if your server isn't written in JavaScript, you're going to be dealing with kind of double maintenance of controlling and templating logic, as well. So what's another option? What's another way that JavaScript sites could work well with search? We have another option that we'd like to introduce. We call it dynamic rendering. In a nutshell, dynamic rendering is the principle of sending normal, client side rendered content to users, and sending fully server side rendered content to search engines and to other crawlers that need it. This is the policy change that we talked about before. So we call it dynamic because your site dynamically detects whether or not the request there is a search engine crawler, like Googlebot, and only then sends the server side rendered content directly to the client. You can include other web services here, as well, that can't deal with rendering-- for example, maybe social media services, or chat services, anything that tries to extract structured information from your pages. And for all other requesters, so your normal users, you would serve your normal hybrid or client side rendered code. This also gives you the best of both worlds, and makes it easy for you to migrate to hybrid rendering for your users over time, as well. One thing to note-- this is not a requirement for JavaScript sites to be indexed. As you'll see later, Googlebot can render most pages already. For dynamic rendering, our recommendation is to add a new tool or step in your server infrastructure to act as a dynamic renderer. This reads your normal, client side content and sends a pre-rendered version to search engine crawlers. So how might you implement that? We have two options here that help you to kind of get started. The first is Puppeteer, which is a Node.js library, which wraps a headless version of Google Chrome underneath. This allows you to render pages on your own. Another option is Rendertron, which you can run as a software, as a service that renders and caches your content on your side, as well. Both of these are open source, so you could make your own version or use something from a third party that does something similar, as well. For more information on these, I'd recommend checking out the I/O session on Headless Chrome. I believe there's a recording about that already. Either way, keep in mind, rendering can be pretty resource intensive. So we recommend doing this out of band from your normal web server and implementing caching as you need it. So let's take a quick look at what your server infrastructure might look like with a dynamic renderer integrated. Requests from Googlebot come in on the side here. They're sent to your normal server, and then perhaps through a reverse proxy, they're sent to the dynamic renderer. There, it requests and renders the complete final page and sends that back to the search engines. So without needing to implement or maintain any new code, this setup could enable a website that's designed only for client side rendering to perform dynamic rendering of the content to Googlebot and to other appropriate clients. If you think about it, this kind of solves the problems that Tom mentioned before, and now we can be kind of confident that the important content of our web pages is available to Googlebot when it performs its initial wave of indexing. So how might you recognize Googlebot requests? This is actually pretty easy. So the easiest way to do that is to find Googlebot in the user-agent string. You can do something similar for other services that you want to serve pre-rendered content to. And for Googlebot, as well as some others, you can also do a reverse DNS look up if you want to be sure that you're serving it just to legitimate clients. One thing to kind of watch out for here is that if you serve adapted content to smartphone users versus desktop users, or you redirect users to different URLs, depending on the device that they use, you must make sure that dynamic rendering also returns device focused content. In other words, mobile search engine crawlers, when they go to your web pages, they should see the mobile version of the page. And the others should see the desktop version. If you're using responsive design-- so if you're using the same HTML and just using CSS to conditionally change the way that content is shown to users, this is one thing you don't need to watch out for, because the HTML is exactly the same. What's not immediately clear from the user agents is that Googlebot is currently using a somewhat older browser to render pages. It uses Chrome 41, which was released in 2015. The most visible implication for developers is that newer JavaScript versions and coding conventions, like arrow functions, aren't supported by Googlebot. And with that, also, any API that was added after Chrome 41 currently isn't supported. You can check these on a site like, canIuse. And while you could theoretically install an older version of Chrome, we don't recommend doing that, for obvious security reasons. Additionally, there are some APIs that Googlebot doesn't support because they don't provide additional value for Search. We'll check these out, too. All right, so you might be thinking, this sounds like a lot of work, John. I don't know. Do I really need to do this? So a lot of times, Googlebot can render pages properly. Why do I really have to watch out for this? Well, there are a few reasons to watch out for this. First is if your site is large and rapidly changing. For example, if you have a news website, that has a lot of new content that keeps coming out regularly and requires quick indexing. As Tom showed, rendering is deferred from indexing. So if you have a large, dynamic website, then kind of the new content might take a while to be indexed, otherwise. Secondly, if you rely on modern JavaScript functionality. For example, if you have any libraries that can't be transpiled back to ES 5, then dynamic rendering can help you there. And that said, we continue to recommend using proper graceful degradation techniques, so that even older clients have access to your content. And finally, there's a third reason to also look into this. In particular, if you're using social media sites-- if your site relies on sharing through social media or through chat applications. If these services require access to your page's content, then dynamic rendering can help you there, too. So when you might not use dynamic rendering. I think the main aspect here is balancing the time and effort needed to implement and to run this with the gains that are received. So remember, implementation and maintenance of dynamic rendering can use a significant amount of server resources. And if you see Googlebot is able to index your pages properly, then if you're not making critical, high frequency changes to your site, maybe you don't need to actually implement anything special. Most sites should be able to let Googlebot render their pages just fine. Like I mentioned, if Googlebot can render your pages, then probably you don't need dynamic rendering for that site. Let's take a look at a few tools to help you figure out what the situation is. When diagnosing rendering, we recommend doing so incrementally. First, checking the raw HTTP response, and then checking the rendered version, either on mobile or on mobile and desktop, if you serve different content, for example. Let's take a quick look at these. So looking at the raw HTTP response, one way to do that is to use Google Search console. To gain access to Google Search console and to a few other features that they have there, you first need to verify ownership of your website. This is really easy to do. There are a few ways to do that. So I'd recommend doing that, regardless of what you're working on. Once you have your site verified, you can use a tool called Fetch as Google, which will show the HTTP response that was received by Googlebot, including the response code on top and the HTML that was provided before any rendering was done. This is a great way to double check what is happening on your server, especially if you're using dynamic rendering to serve different content to Googlebot. Once you've checked the raw response, I recommend checking how the page is actually rendered. So the tool I use for this is the mobile friendly test. It's a really fast way of checking Google's rendering of a page. As I mentioned, that name suggests that it's made for mobile devices. So as you might know, over time, our indexing will be primarily focused on the mobile version of a page. We call this mobile-first indexing. So it's good to already start focusing on the mobile version when you're testing rendering. We recommend testing a few pages of each kind of page within your website. So for example, if you have an e-commerce site, check the home page, some of the category pages, and some of the detail pages. You don't need to check every page on your whole website, because a lot of times, the templates will be pretty similar. If your pages render well here, then chances are pretty high that Googlebot can render your pages for Search, as well. One thing that's kind of a downside here is that you just see the screenshot. You don't see the rendered HTML here. What's one way to check the HTML? Well, new for I/O-- I think we launched this yesterday. We've added a way to review the HTML after rendering. This is also in the mobile friendly test. It shows you what was created after rendering with the mobile Googlebot. It includes all of the markup for links, for images, for structured data-- any invisible elements that might be on the page after rendering. So what do you do if the page just doesn't render properly at all? We also just launched a way to get full information about loading issues from a page, as well. On this part within the mobile-friendly test, you can see all of the resources that were blocked by Googlebot. So this could be JavaScript files or API responses. A lot of times, not everything needs to be crawled, kind of like Tom mentioned. For example, also, if you have tracking pixels on a page, Googlebot doesn't really need to render those tracking pixels. But if you use an API to pull in content from somewhere else, and that API end point is blocked by robots.txt, then obviously, we can't pull in that content at all. An aggregate list of all of these issues is also available in Search console. So when pages fail in a browser, usually I check the developer console for more information, to see more details on exceptions. And new for I/O, one of the most requested features from people who make JavaScript powered sites for Search is also showing the console log when Googlebot tries to render something. This allows you to check for all kinds of JavaScript issues. For example, if you're using ES6, or if you just have other issues with the JavaScript when it tries to run. This makes my life so much easier because I don't have to help people with all of these detailed rendering issues that much. Desktop is also a topic that still comes up. As you've seen in maybe some of the other sessions, desktop isn't quite dead. So you can run all of these diagnostics in the rich results test, as well. This tool shows a desktop version of these pages. So now that we've seen how to diagnose issues, what kind of issues have we run across with modern JavaScript powered sites? What patterns do you need to watch out for and handle well on your side? So remember Tom mentioned at the beginning of the talk something about lazy loading images and being unsure if they're indexable? Well, it turns out, they're only sometimes indexable. So it was good to look at that. Depending on how lazy loading is implemented, Googlebot may be able to trigger it, and with that, may be able to pick up these images for indexing. For example, if the images are above the fold, and you're lazy loading kind of runs those images automatically, then Googlebot will probably see that. However, if you want to be sure that Googlebot is able to pick up lazy loaded images, one way to do that is to use a noscript tag. So you can add a noscript tag around a normal image element, and we'll be able to pick that up for image search directly. Another approach is to use structured data on a page. When we see structured data that refers to an image, we can also pick that up for Image Search. As a side note for images, we don't index images that are referenced only through CSS. We currently only index images that are kind of embedded with the structured data markup or with image tags. Apart from lazy loaded images, there are other types of content that require some kind of interaction to be loaded. What about tabs that load the content after you click on them, or if you have infinite scroll patterns on a site? Googlebot generally won't interact with a page, so it wouldn't be able to see these. There are two ways that you can get this to Googlebot, though. Either you can pre-load the content and just use CSS to toggle visibility on and off. That way, Googlebot can see that content from the preloaded version. Or alternately, you can just use separate URLs and navigate the user and Googlebot to those pages individually. Now, Googlebot is a patient bot, but there are a lot of pages that we have to crawl. So we have to be efficient and kind of go through pages fairly quickly. When pages are slow to load or render, Googlebot might miss some of the rendered content. And since embedded resources are aggressively cached for Search, rendering timeouts are really hard to test for. So to limit these problems, we recommend making performant and efficient web pages, which you're hopefully already doing for users, anyway, right? Anyway, in particular, limit the number of embedded resources and avoid artificial delays like timed interstitials, like here. You can test pages with the usual set of tools and roughly test rendering with the mobile-friendly testing tool. And while timeouts here are a little bit different for indexing, in general, if the pages work in the mobile-friendly test, they'll work for search indexing, too. Additionally, Googlebot wants to see the page as a new user would see it. So we crawl and render pages in a stateless way. Any API that tries to store something locally would not be supported. So if you use any of these technologies, make sure to use graceful degradation techniques to allow anyone to view your pages, even if these APIs are not supported. And that was it with regards to critical best practices. Now, it's time to take a quick circle back and see what we've seen. So first, we recommend checking for proper implementation of best practices that we talked about. In particular, lazy loaded images are really common. Second, test a sample of your pages with the mobile-friendly test and use the other testing tools, as well. Remember, you don't need to test all of your pages. Just make sure that you have all of the templates covered. And then finally, if pages are large and if sites are large and quick changing, or you can't reasonably fix rendering across a site, then maybe consider using dynamic rendering techniques to serve Googlebot and other crawlers a pre-rendered version of your page. And finally, if you do decide to use dynamic rendering, make sure to double check the results there, as well. One thing to keep in mind-- indexing isn't the same as ranking. But generally speaking, pages do need to be indexed before their content can appear in Search at all. I don't know. Tom, do you think that covers about everything? TOM GREENAWAY: Well, it was a lot to take in, John. That was some amazing content. But I guess one question I have, and I think maybe other people in the audience have this on their mind, as well, is is it always going to be this way, John? JOHN MUELLER: That's a great question, Tom. I don't know. I think things will never stay the same. So as you mentioned in the beginning, this is a challenge for us that's important. Within Google Search, we want our search results to reflect the web as it is regardless of the type of website that's used. So our long term version is that you, the developers, shouldn't need to worry as much about this for search crawlers. So circling back on the diagram that Tom showed in the beginning with deferred rendering, one change we want to make is to move rendering closer to crawling and indexing. Another change we want to make is to make Googlebot use a more modern version of Chrome over time. Both of these will take a bit of time. I don't like making long-term predictions, but I suspect it will be at least until end of year until this works a little better. And similarly, we trust that rendering will be more and more common across all kinds of web services. So at that point, dynamic rendering is probably less critical for modern sites. However, the best practices that we talked about-- they'll continue to be important here, as well. How does that sound, Tom? TOM GREENAWAY: That sounds really great. I think that covers everything, and I hope everyone in the room has learned some new approaches and tools that are useful for making your modern JavaScript powered websites work well in Google Search. If you have any questions, we'll be in the mobile web sandbox area together with the Search console team. And alternatively, you can always reach out to us online, as well, be it through Twitter, our live Office Hours Hangouts, and in the Webmaster Help forum, as well. So thanks, everyone, for your time. JOHN MUELLER: Thank you. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Search Central
Views: 54,612
Rating: undefined out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate
Id: PFwUbgvpdaQ
Channel Id: undefined
Length: 39min 41sec (2381 seconds)
Published: Thu May 10 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.