English Google SEO office-hours from February 5, 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
JOHN MUELLER: All right. Welcome, everyone, to today's Google SEO office hours. My name is John Mueller. I'm a Search Advocate at Google here in Switzerland. And part of what we do are these Office Hour Hangouts where people can jump in and ask their questions around their website and web search. And we can try to find some answers. As always, a bunch of stuff was submitted on YouTube. So we can go through some of that. But if any of you want to get started with our first question, you're welcome to jump in. SHAO CHIEH LO: Hey, John. I have a question. JOHN MUELLER: Sure. SHAO CHIEH LO: So this question is regarding to e-commerce. So there is a lot of e-commerce pages. They have a refined pages, right? So they will be a pages that they will put a certain amount of product in the page based on a predefined logic to show off those products in those pages. And those pages that, in my case, I have the client is a very good pages to capture [INAUDIBLE] keywords because they are refinancing that, for example, computer. And at the same time, they have AMD chips, things like that. And when people are searching for [INAUDIBLE] keyword that computer AMD processor, those pages are a very good place to capture them. However-- so here my question is, for those pages, they are some kind of design that people will because they automatically assign product and you can't put all the product in first rendering of those kind of pages. So every time when there's new product adding into the inventory or some product is removed from the inventory, the page content of first rendering product will change. And I will send a sample here in the comments section. For example, this is not my client, but because I'm not allowed to show my client's website, but it's a similar situation. So when you click into that link, you go into-- you can see that they are a lot of product in the page. But when you scroll down to the bottom, they will have Load More. So people only see the product that before Load More, right? And whenever we add inventory into the product or remove something, the first rendering product will change. So Google will constantly seeing different content of that page. Will they confuse Google? How do we solve this problem? JOHN MUELLER: That's essentially fine. That's totally normal. I think with e-commerce, with a very busy site, you have those kind of shifts all the time. With news websites, you have it similar that you have news articles all the time. And when you look at the home page of a news site, there are always different articles that are linked there. And from our point of view, that's fine. The important part, I think, especially with e-commerce is that we're able to find the individual product pages themselves. So somewhere along the line, we need to have kind of persistent links to those products. So that could be on that page. It could be on page two or page three or page four of that listing, something like that. So that's kind of the important part there. I wouldn't worry that the pages change from load to load, because what will happen from a search point of view is we will recognize there's specific content for this topic on this page. And we'll try to bring queries to the page that kind of match the general topic. And if a computer model one or computer model two is shown there and they're essentially equivalent because they're in the same category of product, then that doesn't really change much for us. SHAO CHIEH LO: So from what I heard is that as long as the logic of assigning product is consistent and the product is showing up in the first rendering match the, for example, title [INAUDIBLE] on the page, then that is fine. But if the logic is not consistent and they suddenly have other product, then there will be a problem. JOHN MUELLER: Yeah. So for example, if you have a clothing store and you have a category that's just blue and the category blue has everything from socks to jackets and everything in between, then it's really hard for us to say, this is a landing page for this type of product. So we will constantly be confused by a page like that, whereas if it's a landing page about, I don't know, blue jackets, for example-- like, category jackets and color blue-- then it doesn't really matter which jackets you show there. They all match that intent and it's pretty clear to the user. They fit into that category. SHAO CHIEH LO: So as long as these product, new adding product are still blue jacket, even if different blue jacket in the first rendering in the [INAUDIBLE] OK. JOHN MUELLER: Yeah. SHAO CHIEH LO: But if there is, like, one time there's a red jacket adding to that product, there will be a problem? JOHN MUELLER: I think individual cases is absolutely no problem. If it's always something random in there, then it gets hard for us to understand the pattern. SHAO CHIEH LO: OK, thank you so much. Sorry for [INAUDIBLE]. SEAN BRATTIN: Hey, John. JOHN MUELLER: Hi. SEAN BRATTIN: Thank you. My question is regarding image search and why one image might be shown preference over another, specifically-- it's on a product page that uses an image slider to display pictures of the product. And considering that pretty much everything is nearly identical-- like, alt text, file name, nearby text mentions weight, things of that nature, why might a kind of seemingly random image from within the slider sequence, maybe third or fourth thumbnail, be shown preference over a featured image, like usually the first image that you would see on a product page? JOHN MUELLER: I don't know. It's hard to say. I don't think-- so we have various things that kind of go into image search. And on the one hand, it is kind of the aspects that you mentioned, like the titles of the page, the image filename, the captions, alt text, things like that. But we do also have some logic that tries to understand, is this a high quality image or not? And it's possible-- I don't know those images-- that our systems are either getting confused by the contents of the image, or that they clearly see one image is significantly higher quality than the other. And for us, then maybe we would show it a little bit more visibility like that. But it's something where I think there are always a number of different factors that play into that. And even for multiple images on the same page, which are kind of in the same category of things, it's possible that we kind of show them in one order once and show them in a different order another time. SEAN BRATTIN: OK. So are you able to comment on this? I imagine that Cloud Vision has something to do with that, trying to match similarities with machine learning to the entities. Am I on the right track here? JOHN MUELLER: I don't know how far we would use something like that. I do think that, at least as far as I understand, we've talked about doing that in the past, specifically for image search. But it's something where, just purely based on the contents of the image alone, it's sometimes really hard to determine how the relevance should be for a specific query. So for example, you might have, I don't know, a picture of a beach. And we could recognize, oh, it's a beach. There's water here. Things like that. But if someone is searching for a hotel, is a picture of the beach the relevant thing to show? Or is that, I don't know, a couple of miles away from the hotel? It's really hard to judge just based on the content of the image alone. So I imagine if or when we do use kind of machine learning to understand the contents of an image, it's something auxillary to the other factors that we have. So it's not that it would completely override everything else. SEAN BRATTIN: Gotcha. Thank you. JOHN MUELLER: Sure. NEERAJ PANDEY: John, just one follow up on this. Does Google have any plan of machine learning auto detection of what is happening in-- what is there in picture? Because I am seeing that different devices also have this kind of feature. Does Google also have any plan of implementing this kind of feature? JOHN MUELLER: What is happening within the image? I don't know. Kind of like with the previous question, it's something where it's certainly possible, to some extent, to pull out some additional information from an image, which could be like objects in the image or what is happening in the image. But I don't know if that would override any of the other factors that we have there. So my understanding is this is probably something that would be more of on the side, if we have multiple images that we think are kind of equivalent and we can clearly tell somehow that this one is more relevant because it has, I don't know, the objects or the actions that someone is searching for, then maybe we would use that. But I honestly don't know what we've announced in that area or what we're actually using for search there, because the thing to keep in mind is that there are a lot of different elements that are theoretically possible that might be done kind of in consumer devices. There are lots of things that are patented that are out there that are kind of like theoretically possible. But just because it's possible in some instances doesn't mean that it makes sense for Search. And we see that a lot with patents when it comes to Search, where someone will patent a really cool algorithm or set up that could have an implication for Search. But just because it's patented from Google and maybe even from someone who works on Search doesn't mean that we actually use it in Search. NEERAJ PANDEY: Yeah. OK. Thank you. JOHN MUELLER: Sure. OK. Let me run through some of the submitted questions. And if you have questions along the way, feel free to jump in and we'll almost certainly have time towards the end for more questions from all of you. All right. The first question is about Google Discover. One of the sites I'm running is about anime, fan art, cosplay, fan fiction. Was performing fairly well in Discover. But one day to another the traffic dropped to 0 without any significant change on the site. In Google Search, it's growing before and after that. What kind of problems could bring that situation? I don't know. It's really hard to say without looking at the site. But in general, when it comes to Google Discover, one of the things that I've noticed from the feedback from folks like you all is that the traffic tends to be very kind of on or off in that our systems might think, well, it makes sense to show this more in Discover. And then suddenly you get a lot of traffic from Discover. And then our algorithms might at some point say, well, it doesn't make sense to show it that much in Discover anymore and the traffic goes away. And especially with Discover, it's something which is not tied to a specific query. So it's really hard to say what you should be expecting, because you don't know how many people are kind of interested in this topic or where we would potentially be able to show that. So that's something where if you do see a lot of visibility from Google Discover, I think that's fantastic. I just would be careful and kind of realize that this is something that can change fairly quickly. Additionally, we also-- for Discover, we have a Help Center article that goes into pretty much detail what kind of things we watch out for, and in particular, what kind of things we don't want to show in Discover. So that's something that you might want to double check. Depending on, I guess, the site that you have, that's something that might be more relevant or less relevant there. But I would definitely check that out. What are the levels of site quality demotions? Is there a first level where everything sitewide looks fine, no demotion. Second level, you demote some pages that are not relevant. Or a third level site wide is not good at all. So my understanding is, we don't have these different levels of site-wide demotion where we would say, we need to demote everything on the website or we don't need to demote anything on the website. I think depending on the website, you might see aspects like this or might feel like aspects like this. But for the most part, we do try to look at it as granular as possible. And in some cases, we can't look at it as granular as possible. So we'll look at kind of like different chunks of a website. So that's something where, from our side, it's not so much that we have different categories and we say like, in this category or in that category. It's just that there is almost like a fluid transition across the web. And also when it comes to things where our algorithms might say, oh, we don't really know how to trust this, for the most part, it's not a matter of trust is there or trust is not there. It's like, yes or no. But rather, we have this really kind of fluid transition where we think, well, we're not completely sure about this. But it makes sense for these kind of queries, for example. Or it makes sense for these kind of queries. So that's something where there is a lot of room. Let's see. I have a question about omitted results. We publish two large dotcoms, horoscope and astrology, with each own URL and content teams. After ranking on the first page for astrology queries for multiple years, in February last year, only one of the sites began to show up for normal search results at a time. Whichever site has the highest ranking for a given query will show up, with the other website being classified as an omitted result. There's no duplicate content or cross links between the sites, so I'm curious why this is happening. It's really hard to say without looking at the specific sites and looking at the specific situation. So usually, with two websites, if they're not completely the same, then we would rank them individually, even if there is kind of like an ownership relationship there. So from that point of view, it might also just be something that is kind of not related to what you're suspecting, in that our algorithms think that it's like the same site and we should only show one of these at the same time. I have seen situations where, if there are a large number of sites that are involved, a large number of domains, that our algorithms might say, well, all of these domains are essentially the same content, and we should just pick one of these to show rather than all of these. But usually, if there are two websites and they're kind of unique in their own ways, then that's something where we would try to show them individually. So I think from a practical point of view, what I would do here is go to the Webmaster Help forums and post the details what you're seeing here-- maybe some screenshots, specific URLs and queries where you're seeing this happening. And the folks there can take a look at that and maybe guide you into, I don't know, if there's something specific that you could be doing differently there, maybe they can point you to that. Or maybe they can point you in the direction of saying, well, it is how it is. That's nothing kind of unnatural that's happening there. But also, the folks active in the health forums have the ability to escalate things to Google teams. So if they think this is really weird and maybe something weird is happening on Google's side, then they can escalate that to someone at Google. Let's see. Does Google Search consider each URL of a website individually? For example, does a low score on a domain homepage have any effect on the other pages which have a high score? So yeah, like I mentioned before, we try to be as granular as possible, as fine grained as possible, in the sense that we try to focus on individual pages. But especially within a website, you're kind of always linking to the other pages of your website. So there is kind of a connection between all of these pages. And if one page is really bad and we think that's the most important page for your website, then obviously that will have an effect on the other pages within your website, because they're all kind of in context of that one main page, for example, whereas if one page on your website is something that we would consider not so good and it's some random part of your website, then that's not going to be the central point where everything evolves around. Then from our point of view, that's like, well, this one page is not so great. But that's fine. It doesn't really affect the rest. The mobile section of Core Web Vitals in Search Console shows a bad URL on original link while the AMP version of the same URL is a good URL. Why are these two considered separately? So essentially, what happens there is that we don't focus so much on the theoretical aspect of, this is an AMP page and there's a canonical here. But rather, we focus on the data that we see from actual users that go to these pages that navigate to them. So that's something where you might see an effect of lots of users are going to your website directly and they're going to the non-AMP URLs maybe, depending on how you have your website set up. And in Search, you have your AMP URLs. Then we probably will get signals, or enough signals that we track them individually for both of those versions. So on the one hand, people going to Search, going to the AMP versions, and people may be going to your website directly, going to the non-AMP versions. And in a case like that, we might see information separately from those two versions. And then kind of like we have those two versions and the data there. So we'll show that in Search Console like that, whereas if you set up your website in a way that you're consistently always working with the AMP version, that maybe all mobile users go to the AMP version of your website, then that's something where we can clearly say, well, this is the primary version. We'll focus all of our signals on that version. The next question there is, since AMP is enabled, will Google Mobile Search consider only the AMP version, which passes the Core Web Vitals test when ranking the website, or will the original link also be considered? So I mean, on the one hand, there is the aspect of, is it a valid AMP or not? If it's not a valid AMP, then we wouldn't show it. So that's one aspect that goes into play there. But I think in the theoretical situation that we have data for the non-AMP version and data for the AMP version, and we would show the AMP version in the search results, then we would use the data for the version that we show in Search as the basis for the rating. So in a case like that where we clearly have data for both of these versions and we would pick one of those versions to show, then we would use the data for that version. That's similar, I think, also with international websites where you have different kind of URLs for individual countries. And if we have data for one version, we show that. Or if we have one version that we would show in the search results and we have data for that version, that we'll use the data for that version, even if we have kind of other data for the other language or other country versions. Yeah. The only case where I know where we would fold things together is with regards to the AMP cache, because theoretically the AMP cache is also located in yet another place, another set of URLs. But with the AMP cache, we know kind of how we should fold that back to the AMP version and track that data there. So that's a little bit of an exception. But if you have kind of separate AMP versions and separate mobile versions on your site, then it's very possible that we could track those individually. Does Google weigh the exact match title tag more in comparison to the title tag focused more on users? So let's say the phrase I want to rank for is Audi A3. And one version of the title tag is this exact match. The other version, is this car for sale? 152 great models. Would this title will be scored less relevant for the query Audi A3, just because it is longer and not exact match? I don't think we have any exact definition on how that would pan out in practice. So there is certainly an aspect of, does this title kind of match the query? But we also try to understand the relevance of the query. We try to understand things like synonyms or kind of more context around the query, around the titles, as well. So I don't think there's a simple kind of like exact match to the query is better or not exact match to query is better there. So my recommendation there would be to test this out and just try it out. And not so much in terms of SEO, which one will rank better, but think about which one of these would work better in the search results? And that's something you could try out on one page, you could try out on multiple pages that are kind of set up in a similar way. And then based on that, you can determine, well, this one kind of attracts more clicks from users. It matches the intent that the user has better somehow. So I'll stick to that model and use that across the rest of my website. So that's kind of my recommendation there. I think with kind of the general information retrieval point of view, of which one of these would be the better fit, I imagine you could get into really long arguments with the people that are working on information retrieval on which one is better or not. So yeah, I don't think there is like this one clear answer. Comments below the blog posts-- are comments still a ranking factor? Migrating to another CMS and would like to get rid of all comments. About one to three not really relevant comments below many blog posts. Can I delete them safely without losing any ranking? I think it's ultimately up to you. From our point of view, we do see comments as a part of the content. We do also, in many cases, recognize that this is actually the comment section, so we need to treat it slightly differently. But ultimately, if people are finding your pages based on the comments there, then if you delete those comments, then obviously we wouldn't be able to find your pages based on that. So that's something where depending on the type of comments that you have there, the amount of comments that you have, it can be the case that they provide significant value to your pages. And it can be a source of additional kind of information about your pages. But it's not always the case. So that's something where I think you kind of need to look at the contents of your pages overall, the queries that are leading to your pages, and think about which of these queries might go away if my comments were not on those pages? And based on that, you can try to figure out what you need to do there. It's certainly not the case that we completely ignore all of the comments on a site, so just blindly going off and deleting all of your comments in the hope that nothing will change, I don't think that will happen. When using interstitials as product pages, does Google index the content on those interstitials or does it only index the content on the static pages? So I wasn't quite sure how you use interstitials as product pages. It seems like a kind of unique setup. But anyway, I think it's less a matter of interstitials or not but more a matter of what content is actually shown when we load those pages. So if we load this HTML page and by default it always doesn't show any product information, then we wouldn't have that product information to actually index, whereas if you kind of load that page and it takes a second and then it pops up the full content, the full product information, then essentially by loading the page, we have that information. And we can use that to index and to rank those pages. So it's kind of a simple way to double check. What we would be able to pick up with regards to indexing is to take that URL and use something like the mobile friendly test or the URL inspection tool in Search Console and copy it in there and to see if Google is able to bring up the full product information or not. If Google can bring up the product information, then probably that's OK, whereas if Google only shows you kind of the static page behind that, then probably it's the case that we wouldn't be able to pick up the product information. So that's one thing to watch out for. I think what threw me off with this question initially is also the word interstitials there in the sense that usually interstitials are something that are kind of between the content that you're looking for and maybe kind of like what is actually loaded on the browser. So if you go to a page and instead of the product page it shows a big interstitial showing something else, that's kind of the usual setup for interstitials. And from our point of view, those kind of interstitials, if they're intrusive interstitials in the sense that they get in the way of the user actually interacting with the page, then that would be something we would consider a negative ranking factor. So if it's really a case that when you go to your pages it just takes a bit and then your product page pops up, then I wouldn't call those interstitials. Maybe use some other word for that, because if you ask around in the health forums or elsewhere and you say, oh, my interstitials and I want to rank for my interstitials, then probably a lot of people will be confused. And it seems like Google Images crawls some SVG files in SVG then [? Summit ?] renders into PNG while serving in the search results. What is the reason for that? Is there a way that we can dictate this behavior for a Google Image crawler? I don't know. I wasn't aware of how this is happening. So I'm not 100% sure what exactly you're referring to. My understanding is that when it comes to images, especially the vector formats like SVG, which don't always have a well-defined size, what we do internally is we convert that into a normal pixel image so that we can treat it the same way as we can treat other kinds of images. That means for all of the normal processing internally and all of that, and also specifically with regards to the thumbnail images that we can show so that we can scale it down using the normal pixel scaling functions and get it to the right size and get it into an equal resolution to the other thumbnails that we show. So probably that is something that is happening there. And that's not something that you can easily change because we kind of have our system set up to deal with pixel-based images. And that's what we would do there. With regards to kind of the next step from there, the expanding of the image when you click on it in the Image Search results, I don't know how that would be handled with regards to SVGs or if we do some kind of pixel-based bigger preview or SVG based bigger preview. So I don't quite know how we would handle that there. If you have any examples where this is causing problems, I would love to see them. So feel free to send me anything that you run across in that regard, especially when you see that it is causing weird problems that could be avoided by doing it slightly differently. Is there anything that we can do in terms of SEO to improve user journey? I think those are kind of separate topics. So it's not something that you would do SEO to improve user journey. But rather, you have your user journeys that you use to kind of analyze your product and try to find the best approaches that you can do there. And then based on that, you would also try to do some SEO to improve things in the search results. So one is kind of improving things for the user. And the other is a kind of improving things for search engines. Sometimes if things align well then there is enough overlap that they work together. But essentially, they're separate topics. What is the best way to treat syndicated content on my site if the content is already in other sites, too? Do I have to no-index my page or do a canonicalize to the original source? Do I no-follow all internal links of that page? Yeah, good question. I don't think we have exact guidelines on syndicated content. Generally, we do recommend using something like a rel canonical to the original source. I know that's not always possible in all cases. So sometimes what can happen is we just recognize that there's syndicated content on a website and then we essentially try to rank that appropriately. So if you're syndicating your content to other sites, then it's theoretically possible that those other sites also show up in the search results. It's possible that maybe they even show up above you in the search results, depending on the situation. So that's something to kind of keep in mind if you're syndicating content. If you're hosting syndicated content on your website, then that's kind of similar to keep in mind in the sense that most of the time we would try to show the original source. And just because you have a syndicated version of that content on your site, as well, doesn't mean we will also show your website in the search results. So usually what I recommend there is to make sure that you have significant, unique, and compelling content of you own on your website. So if you're using syndicated content to kind of fill out additional facets of information for your users, then that's perfectly fine. I wouldn't expect to rank for those additional kind of facets or filler content that you have there. It can happen, but it's not something I would count on. And instead, for the SEO side of things, for the ranking side of things, I would really make sure that you have significant kind of unique content of your own so that when our systems look at your website, they don't just see all of this content that everyone else has, but rather, they see a lot of additional value that you provide that is not on the other sites, as well. And when we want to rank for a specific topic on Google, is it a good practice to also cover related topics, for example, if we sell laptops and we want to rank for that, is it useful to create posts like reviewing laptops, introducing the best new laptops, those kind of things? And if it's useful, then does it have to be done in any special way? So I think this is always useful, because what you're essentially doing is, on the one hand, for search engines, you're kind of building out your reputation of knowledge on that specific topic area. And for users, as well, it provides a little bit more context on kind of like why they should trust you. If they see that you have all of this knowledge on this general topic area and you show that and you kind of present that regularly, then it makes it a lot easier for them to trust you on something very specific that you're also providing on your website. So that's something where I think that that always kind of makes sense. And for search engines, as well, it's something where, if we can recognize that this website is really good for this broader topic area, then if someone is searching for that broader topic area, we can try to show that website, as well. We don't have to purely focus on individual pages, but we'll say, oh, it looks like you're looking for a new laptop. This website has a lot of information on various facets around laptops. How long should we wait for a Search Console manual action response? It's been months in. I avoided resubmitting because that's not nice. But do these ever get lost? If we don't get any replies, what should we be doing as a next step? Depending on the type of manual action, it can take quite a bit of time. So in particular, I think the link-based manual actions are things that can take quite a bit of time to be reviewed properly. And it can happen, in some cases, that it takes a few months. So usually what happens is, if you resubmit the reconsideration request, then we will drop the second reconsideration request, because we think it's a duplicate. The team internally will still be able to look at it. And if you have additional information there, that's perfectly fine. If it's essentially just copy and paste of the same thing, then I don't think that changes anything. It's also not the case that you would have a negative effect from resubmitting a reconsideration request. So in particular, if you're not sure that you actually sent the last one, then you're like, oh, someone on my team sent it and now I'm not sure if they actually sent it or not, then resubmitting it is perfectly fine. It's not that there will be kind of an additional penalty for resubmitting the reconsideration request. It's just when the team sees that one is still pending, they'll focus on that pending one rather than the additional ones. If you don't see any response with regards to manual actions, specifically around the link manual actions, I would recommend maybe also checking in with the help forums or checking in with other people who have worked on kind of link-based manual actions, because when it takes so long to kind of be processed like this, it's something where you really want to make sure that you have everything covered really well. So that's something where, if you're seeing it taking a long time, you're like, I don't know if I needed to do more or needed to do something different, than going to the help forums is a really good way to get additional feedback from people. And it's very likely that you'll go to the help forums and they'll be like, oh, you should have submitted these 500 other things. And it's not the case that you have to do whatever feedback comes back from the help forum. But rather, it's additional input to take in. And you can review that and say, OK, I will take into account maybe a part of this feedback and maybe skip another part of this feedback, because the folks in the help forum are very experienced with tons of topics, but they don't have the absolute answers. I don't think anyone really has that. So I think it's great to get all of this feedback, but you still have to kind of judge it and weigh it out yourself, as with anything on the internet. Does Page Speed Insights use the Googlebot? I wonder because when I'm looking at the rendered screenshots in PageSpeed Insights based on our site behavior, it looks like those weren't rendered by Googlebot. You're probably right. So in particular, PageSpeed Insights is something which is based on the Chrome setup. So that's something where, as far as I know, the server-based system that does the PageSpeed Insights screenshots and calculations and kind of metrics, all of that, is just purely based on Chrome. And Googlebot also uses Chrome to render pages. But there is some kind of unique aspects with regards to Googlebot that don't apply to PageSpeed Insights-- for example, robots.txt. So when Googlebot renders a page, it has to comply with the robots.txt of all of the embedded content there. And if you have maybe a CSS file or JavaScript file blocked by robots.txt, we wouldn't be able to process that from the Googlebot point of view. But PageSpeed Insights would still be able to review that and show that. So that's probably where you're seeing those differences. I think the difference is more and more blurred because Googlebot does use Chrome, as well. So it's very similar. But you can certainly find situations where there are differences. And you can certainly construct situations where there are differences. Like I mentioned, with robots.txt, it's a really simple way to kind of see those differences. With regards to the way that we calculate speed for Search, though, we-- so kind of, I guess, looking forward at the Core Web Vitals, at the moment, I don't really know offhand how we do that. But with regard to Core Web Vitals, we use what users actually see. So it's not the case that Googlebot renders a page very quickly and then it gets a good score, or Chrome, in PageSpeed Insights, renders a page very quickly, therefore it gets a good score. But rather, we look at what users actually saw. And I think that's really important, because that's kind of a measure of what the real world performance is. And all of these tools that kind of render it in more of a lab environment, like Googlebot when it renders a page, or PageSpeed Insights when it renders a page, is something that is almost more of a prediction than an actual kind of measurement, because there are lots of assumptions in play there. And whenever you run something like a rendering of a page within a data center, then you have a very different setup than the average user has with regards to network connectivity, with regards to caches and all of that. It's just very different. So these tools, when you run them and when you look at the measurements that they show, you need to keep in mind, this is more of a prediction rather than an actual value that the users will see. And-- [INTERPOSING VOICES] MICHAEL LEWITTES: I'm sorry. No, I was going to-- no, I was going to ask a follow up, but I'm sorry. You continue. JOHN MUELLER: And it's something that these tools also try to build in, the sense that they will say, well, I run in a data center, but I will act like I have a 3G phone and kind of a slow connection. And they'll try to emulate that, but it's still very different than actual users. Go ahead. MICHAEL LEWITTES: Yeah, I'm sorry about that. So in terms of assessing it, I was reading a site, seemed like it had a lot of ads. So I decided, OK, let me see how this is scored on PageSpeed Insights. And it rendered a score with the circle of 21, which was in the red, not very good. But below that was a visual, and below that numerical and visual representation was a sentence that read that based on field data, the page passed in green the assessment. And then below, there were these measurement bars for cumulative layout shift, first input delay, et cetera, and those were all mostly in the green. So where is the disconnect and what should one be paying attention to, that first visual circle, or the fact that it says it passed the Core Web Vitals assessment? JOHN MUELLER: I need to keep in mind how the PageSpeed Insights looked. I think it has that one overview score on top, right? MICHAEL LEWITTES: Yeah, I actually-- is there a way to present, because I did a screenshot and I redacted the name of the website, if that makes it easier. JOHN MUELLER: So I think what happens in PageSpeed Insights is we take the various metrics there and we try to calculate one single number out of that. And sometimes that's useful to work on or to give you a rough overview of what the overall score would be. But it all depends on how strongly you weigh the individual factors. So it can certainly be the case that overall, when the users see a page, it's pretty fast and sleek. But when our systems test it, they're like, oh, these are some theoretical problems that could be causing issues. And they'll kind of calculate that into the score. So I think the overall score is a really good way to get a rough estimate. And the actual field data is a really good way to see what people actually see. And usually, what I recommend is using those as a basis to determine, like, should I be focusing on improving the speed of the page or not? And then use kind of the lab testing tools out there for kind of determining the individual values and for tweaking them with the work that you're doing. So kind of using the overall score and using the field data as a way to determine, like, should I be doing something on this or not? And then using the lab data with the individual tools to improve things and check that you're going the right direction, because the issue is also the field data is delayed, I think, by about 30 days. So any changes that you make-- and if you're waiting for the field data to update, it's always 30 days behind. And if you're unsure that you're going in the right direction or that you've improved things enough, then waiting 30 days is kind of annoying. MICHAEL LEWITTES: Thank you. JOY F: Hey, John. Can I add a follow up on that, as well? JOHN MUELLER: Sure. JOY F: With regards to Core Vitals, a field data is going to be the one to pay attention to, correct, in terms of ranking signals? Or is it going to be-- [INTERPOSING VOICES] JOHN MUELLER: Yes. It's the field data. THIAGO POJDA: [INAUDIBLE]. While we are in this Core Web Vital topic, I have a small question in this regard is that when this becomes a ranking [INAUDIBLE]-- CLS and all the other [INAUDIBLE] is it going to be page level or domain level? JOHN MUELLER: Good question. So essentially what happens with the field data is we don't have data points for every page. So for the most part, we need to have kind of groupings of individual pages. And depending on the amount of data that we have, that can be a grouping of the whole website, kind of the domain, or I think in the Chrome User Experience Report they use the origin, which would be the subdomain and the protocol there. So that would be kind of the overarching kind of grouping. And if we have more data for individual parts of the website, then we'll try to use that. And I believe that's something you also see in Search Console where we'll show one URL and say there's so many other pages that are associated with that. And that's kind of the grouping that we would use there. THIAGO POJDA: Just why I ask this-- we have this set of pages that they are slow. They exist for a different purpose than our other pages on the site. And these we have a noindex on them. But they are very slow. And that's why we don't want it to be accounted for. JOHN MUELLER: Yeah. I don't think-- or I don't know for sure how we would do things with a noindex there. But it's not something you can easily determine ahead of time. Like, will we see this as one website or will we see it as different groupings there? Sometimes with the Chrome User Experience Report data, you can see, does Google have data points for those noindex pages? Does Google have data points for the other pages there? And then you can kind of figure out, like, OK, it can recognize that there is separate kinds of pages and can treat them individually. And if that's the case, then I don't see a problem with that. If it's a smaller website where we just don't have a lot of signals for the website, then those noindex pages could be playing a role there, as well. So I'm not 100% sure, but my understanding is that in the Chrome User Experience Report data, we do include all kinds of pages that users access. So there's no specific kind of, will this page be indexed like this or not check that happens there, because the indexability is sometimes quite complex with regards to canonicals and all of that. So it's not trivial to determine on the Chrome side if this page will be indexed or not. It might be the case that if a page has a clear noindex, then even in Chrome we would be able to recognize that. But I'm not 100% sure if we actually do that. THIAGO POJDA: All right, thank you. I'll follow up on Twitter. JOHN MUELLER: Yeah. I would also check the Chrome User Experience Report data. I think you can download data into BigQuery and you can play with that a little bit and figure out how is that happening for other sites, for similar sites that kind of fall in the same category as the site that you're working on. Cool. More questions from any of you? CHRISTIAN FEICHTNER: Yes, John. Hi. I suddenly see-- well, it started all at the middle of January. I suddenly saw in Search Console that there are a lot of old URLs popping up, especially in the 404 subcategory under Excluded and in the URL Inspection tool. These old URLs are, for example, old HTTP versions of URLs. And it's even old domains because the websites were moved to a new domain, like, three years ago. So my question is, why is that? Should I be worried? And if yes, how can I fix it? JOHN MUELLER: So these are showing up as 404 errors or-- CHRISTIAN FEICHTNER: These are showing up as a 404 errors. And for some URLs, if I use the URL inspection tool, they also show up as referrers in the URL Inspection tool. JOHN MUELLER: OK. I think if they're just shown as 404s, I would completely ignore that. What happens in our systems is that pages which are 404 are essentially still tracked on our side. And from time to time, we will double check to see that they still have a 404. And that can happen, like that a site is-- has changed significantly, doesn't have these pages for years now. And still from time to time, our systems say, well, we will double check those old URLs and see if they still return 404. And that's not a sign that anything is stuck with those pages. It's just kind of our systems trying to make sure that we're not missing anything from your website. CHRISTIAN FEICHTNER: And if they show up as referring URLs in the URL Inspection tool? JOHN MUELLER: So how do you mean as referring URLs? Like, that they link to another page, or-- CHRISTIAN FEICHTNER: Yes. For example, I used the URL Inspection tool on a URL that's still present. And then in the URL inspection tool you see where Google loads this page from. And there it says, for example, it knows it from the sitemap. And then there are, like, four URLs listed below that. And in that list, this list contains, for example, an old HTTP version. It contains the same file name but from the old URL of the website, which are all URLs that don't exist anymore. So this is also something that makes me worry, or shouldn't it? JOHN MUELLER: That's completely normal. Yeah. That's something where I am not 100% sure which data we show there in Search Console. But we have a concept of the first seen location of a link to a specific page. And we might have seen that URL from that page at some point way in the past. And if that page doesn't exist anymore, it's still like, this is where we first saw it. CHRISTIAN FEICHTNER: OK. So it's basically just make sure that if the original page doesn't exist anymore, it returns a proper 404. If it's redirected, then make sure it's a proper redirect. And in other cases, just ignore it [INAUDIBLE] JOHN MUELLER: Exactly. Yeah. So usually if you have an older website, then over the years, you will collect more and more of these 404 pages. And our system is-- even when they rarely check a 404 page, it's just like the amount of URLs that could be returning 404 grows. So if you look at your service statistics and you look at what Googlebot is requesting, then it can look like, oh, Google is spending so much time on 404s. But for us it's just checking it maybe once a year or so. But because we have so many that we check once a year, it's overall looks like a lot. CHRISTIAN FEICHTNER: Sorry, this website is, like, 10 years old. And last question with that, because the websites I'm talking about will also move to new domains. We used the address change tool. And so basically, just make sure that the old domain still redirects to the new website. This would be the perfect, good setup, and we shouldn't worry about anything further. JOHN MUELLER: Yeah. That sounds great. The one place where people also get confused with that, which is kind of similar, I guess, with old URLs, is that when we recognize that pages have moved, we still have some association of the old location. So we will know that this page on the new website used to be located as a page on the old website, in some sense. So if in the search results you do a site query for the old domain, then even after a couple of years, you'll still see a lot of URLs that are shown there. And it's not the case that we have them indexed there, but rather, we know they used to be there. And it looks a user is explicitly looking for the old location. So we'll show them. So if you look at the cached version of the page in a case like that, then you'll see that actually chose a new domain. So it's a little bit confusing if you look at it like that. But essentially, it should be working properly. CHRISTIAN FEICHTNER: OK, Thank you. JOHN MUELLER: Sure. BILAL AHMED: So I have a related question. For example, if it said proper 301's redirect. And I was just trying to understand the relation of pay it back links only the ones-- if Google has the history of the old links, is it possible that it passes some sort of page rank to new URLs that we set 301 redirects for? For example, we have a site with the back links. And we decided to change the URL with the proper 301s. And back end [INAUDIBLE] always there we normally change a few of those. But they are still there. So if you say, Google has some sort of history there, so would it be possible that it passes some sort of rank [INAUDIBLE] or page rank [INAUDIBLE] to [INAUDIBLE]? JOHN MUELLER: Yes. So essentially what happens there is we will have kind of the old URL on your website that has some signals from the links that go to the old URL. And we have the new URL on your website. And with a redirect, you're basically telling us, these are equivalent and you probably prefer the new URL to be shown. So what we will do is we will put both of those URLs from your website into a group and say, this is kind of a group of URLs that have kind of collected signals. And then with the redirect we will pick usually the destination URL and say, this is the canonical for that group. And the canonical page will then kind of inherit all of the signals that go to that group. So if there are links to the old version of a page, if there are links to a copy of that page, then all of that will be kind of combined together in the canonical version. So that's something that kind of gets passed on there. Specifically when you're talking about site news, we still recommend making sure that you, as much as possible, can update the old links anyway, because what happens there is, we will put those URLs in the same group, kind of like I mentioned. But we use various factors that determine which of these URLs is the right one to show, which one is the canonical one. And redirect is one factor, but also links are another factor. So if all of the links, like internal and external links go to the old version of your URL and you redirect to a new version, we might pick the old version of the URL to show in Search. So that's something kind of to keep in mind that if you want to move everything to a new URL, then make sure that everything is aligned with the new URL-- so the redirect, the sitemap files, the internal linking, as much as possible also the external linking so that everything just fits together with that new kind of URL that you want. BILAL AHMED: Thanks for this question. I have another question. Most of the tools, the SEO check-up tools, they pop up with a warning, which says low text to HTML ratio, which means code is more than written text. Would it be something that we need to worry about? Or is it OK for Google to pick the right text? JOHN MUELLER: We don't have a notion of text to HTML ratio for Search. So that's something where I think a lot of these tools are able to calculate this. And they think, oh, it's worthwhile showing. But it's not an SEO ranking factor kind of thing. There are two places where it could play a role, on the one hand, with regards to speed. So if you have a lot of HTML and you have very little text, then obviously we have to load a lot of content to display the page. So that's one small factor. The other one is with regards to extreme situations where you have a lot and a lot of HTML and very little text. We have limits with regards to what the maximum page size is that we would download for an HTML page. And I think that's in the order of, I don't know, hundreds of megabytes, something like that. So if you have an HTML page that has hundreds of megabytes of HTML and very little text in it, then yes, that could be playing a role. But that's something that I suspect is extremely rare. And if you have that problem, then that's a bigger problem than just like, oh, it's not perfect HTML to text ratio. BILAL AHMED: Perfect. I got that. Thank you. JOHN MUELLER: Sure. Let me just pause the recording here. You're welcome to stick around a little bit longer if you like. But it is always good to kind of keep the recording limited to avoid it becoming super long. Thank you all for joining. Thanks for all of the questions that were submitted and that were asked from you along the way. And I'll set up the next office hours probably later today, which will also be next Friday, but evening European time, more for the American folks so Michael doesn't have to get up in the middle of the night. I don't know how he does it, but thank you. Cool. All right. Let me just pause here. And we can continue after that.
Info
Channel: Google Search Central
Views: 6,325
Rating: undefined out of 5
Keywords: Webmaster Central, Google, SEO, Search Engines, Websites, Search Console, Webmaster Tools, crawling, indexing, ranking, mobile sites, internationalization, duplicate content, sitemaps, pagination, structured data, rich results, English Webmaster Office Hours, Office Hours English, Search, office-hours, office-hour, SEO office hours, English SEO office hours, Search Central, Google Search Central, GSC, SC
Id: cUT84ZIcLtA
Channel Id: undefined
Length: 62min 9sec (3729 seconds)
Published: Fri Feb 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.