JOHN MUELLER: All right. Welcome, everyone, to today's
Google SEO office hours. My name is John Mueller. I'm a Search Advocate at
Google here in Switzerland. And part of what we do are
these Office Hour Hangouts where people can jump in
and ask their questions around their website
and web search. And we can try to
find some answers. As always, a bunch of stuff
was submitted on YouTube. So we can go through
some of that. But if any of you
want to get started with our first question,
you're welcome to jump in. SHAO CHIEH LO: Hey, John. I have a question. JOHN MUELLER: Sure. SHAO CHIEH LO: So this question
is regarding to e-commerce. So there is a lot
of e-commerce pages. They have a refined
pages, right? So they will be
a pages that they will put a certain
amount of product in the page based on
a predefined logic to show off those
products in those pages. And those pages that, in
my case, I have the client is a very good pages to
capture [INAUDIBLE] keywords because they are refinancing
that, for example, computer. And at the same time, they have
AMD chips, things like that. And when people are searching
for [INAUDIBLE] keyword that computer AMD
processor, those pages are a very good place
to capture them. However-- so here my
question is, for those pages, they are some kind
of design that people will because they
automatically assign product and you can't put
all the product in first rendering of
those kind of pages. So every time when
there's new product adding into the inventory or
some product is removed from the inventory, the page
content of first rendering product will change. And I will send a sample
here in the comments section. For example, this
is not my client, but because I'm not allowed
to show my client's website, but it's a similar situation. So when you click into
that link, you go into-- you can see that they are a
lot of product in the page. But when you scroll
down to the bottom, they will have Load More. So people only see the product
that before Load More, right? And whenever we add
inventory into the product or remove something, the first
rendering product will change. So Google will constantly seeing
different content of that page. Will they confuse Google? How do we solve this problem? JOHN MUELLER: That's
essentially fine. That's totally normal. I think with e-commerce,
with a very busy site, you have those kind of
shifts all the time. With news websites, you have
it similar that you have news articles all the time. And when you look at the
home page of a news site, there are always different
articles that are linked there. And from our point
of view, that's fine. The important part, I think,
especially with e-commerce is that we're able to find
the individual product pages themselves. So somewhere along
the line, we need to have kind of persistent
links to those products. So that could be on that page. It could be on page
two or page three or page four of that
listing, something like that. So that's kind of the
important part there. I wouldn't worry that the
pages change from load to load, because what will happen
from a search point of view is we will recognize there's
specific content for this topic on this page. And we'll try to bring queries
to the page that kind of match the general topic. And if a computer model
one or computer model two is shown there and
they're essentially equivalent because they're in
the same category of product, then that doesn't really
change much for us. SHAO CHIEH LO: So
from what I heard is that as long as the logic of
assigning product is consistent and the product is showing
up in the first rendering match the, for example, title
[INAUDIBLE] on the page, then that is fine. But if the logic
is not consistent and they suddenly
have other product, then there will be a problem. JOHN MUELLER: Yeah. So for example, if you
have a clothing store and you have a category
that's just blue and the category
blue has everything from socks to jackets and
everything in between, then it's really
hard for us to say, this is a landing page
for this type of product. So we will constantly be
confused by a page like that, whereas if it's a landing
page about, I don't know, blue jackets, for
example-- like, category jackets
and color blue-- then it doesn't really matter
which jackets you show there. They all match that intent and
it's pretty clear to the user. They fit into that category. SHAO CHIEH LO: So as
long as these product, new adding product
are still blue jacket, even if different blue
jacket in the first rendering in the [INAUDIBLE] OK. JOHN MUELLER: Yeah. SHAO CHIEH LO: But
if there is, like, one time there's a red jacket
adding to that product, there will be a problem? JOHN MUELLER: I think individual
cases is absolutely no problem. If it's always something
random in there, then it gets hard for us
to understand the pattern. SHAO CHIEH LO: OK,
thank you so much. Sorry for [INAUDIBLE]. SEAN BRATTIN: Hey, John. JOHN MUELLER: Hi. SEAN BRATTIN: Thank you. My question is
regarding image search and why one image might be
shown preference over another, specifically-- it's on a product page that
uses an image slider to display pictures of the product. And considering that
pretty much everything is nearly identical-- like, alt text, file
name, nearby text mentions weight,
things of that nature, why might a kind of
seemingly random image from within the
slider sequence, maybe third or fourth thumbnail,
be shown preference over a featured image, like
usually the first image that you would see
on a product page? JOHN MUELLER: I don't know. It's hard to say. I don't think-- so we have
various things that kind of go into image search. And on the one hand, it
is kind of the aspects that you mentioned, like the
titles of the page, the image filename, the captions,
alt text, things like that. But we do also have some logic
that tries to understand, is this a high
quality image or not? And it's possible-- I
don't know those images-- that our systems
are either getting confused by the
contents of the image, or that they clearly
see one image is significantly higher
quality than the other. And for us, then maybe we
would show it a little bit more visibility like that. But it's something where
I think there are always a number of different
factors that play into that. And even for multiple images
on the same page, which are kind of in the same
category of things, it's possible that we kind
of show them in one order once and show them in a
different order another time. SEAN BRATTIN: OK. So are you able to
comment on this? I imagine that Cloud
Vision has something to do with that, trying to
match similarities with machine learning to the entities. Am I on the right track here? JOHN MUELLER: I don't know
how far we would use something like that. I do think that, at least
as far as I understand, we've talked about doing that
in the past, specifically for image search. But it's something
where, just purely based on the contents of
the image alone, it's sometimes really hard to
determine how the relevance should be for a specific query. So for example, you
might have, I don't know, a picture of a beach. And we could recognize,
oh, it's a beach. There's water here. Things like that. But if someone is
searching for a hotel, is a picture of the beach
the relevant thing to show? Or is that, I don't
know, a couple of miles away from the hotel? It's really hard
to judge just based on the content of
the image alone. So I imagine if or when we do
use kind of machine learning to understand the
contents of an image, it's something auxillary to
the other factors that we have. So it's not that
it would completely override everything else. SEAN BRATTIN: Gotcha. Thank you. JOHN MUELLER: Sure. NEERAJ PANDEY: John, just
one follow up on this. Does Google have any plan of
machine learning auto detection of what is happening in-- what is there in picture? Because I am seeing that
different devices also have this kind of feature. Does Google also have
any plan of implementing this kind of feature? JOHN MUELLER: What is
happening within the image? I don't know. Kind of like with the
previous question, it's something where
it's certainly possible, to some extent, to pull out
some additional information from an image, which could
be like objects in the image or what is happening
in the image. But I don't know if
that would override any of the other factors
that we have there. So my understanding
is this is probably something that would
be more of on the side, if we have multiple images that
we think are kind of equivalent and we can clearly tell
somehow that this one is more relevant because it
has, I don't know, the objects or the actions
that someone is searching for, then maybe we would use that. But I honestly don't know what
we've announced in that area or what we're actually
using for search there, because the thing
to keep in mind is that there are a lot
of different elements that are theoretically
possible that might be done kind of in consumer devices. There are lots of
things that are patented that are out there that are kind
of like theoretically possible. But just because it's
possible in some instances doesn't mean that it
makes sense for Search. And we see that a
lot with patents when it comes to Search, where
someone will patent a really cool algorithm or
set up that could have an implication for Search. But just because it's
patented from Google and maybe even from
someone who works on Search doesn't mean that we
actually use it in Search. NEERAJ PANDEY: Yeah. OK. Thank you. JOHN MUELLER: Sure. OK. Let me run through some of
the submitted questions. And if you have questions along
the way, feel free to jump in and we'll almost certainly have
time towards the end for more questions from all of you. All right. The first question is
about Google Discover. One of the sites I'm running
is about anime, fan art, cosplay, fan fiction. Was performing fairly
well in Discover. But one day to
another the traffic dropped to 0 without any
significant change on the site. In Google Search, it's
growing before and after that. What kind of problems
could bring that situation? I don't know. It's really hard to say
without looking at the site. But in general, when it
comes to Google Discover, one of the things
that I've noticed from the feedback from
folks like you all is that the traffic tends
to be very kind of on or off in that our systems
might think, well, it makes sense to show
this more in Discover. And then suddenly you get a
lot of traffic from Discover. And then our algorithms
might at some point say, well, it doesn't make sense to
show it that much in Discover anymore and the
traffic goes away. And especially
with Discover, it's something which is not
tied to a specific query. So it's really hard to say
what you should be expecting, because you don't know how many
people are kind of interested in this topic or where
we would potentially be able to show that. So that's something where if
you do see a lot of visibility from Google Discover, I
think that's fantastic. I just would be careful
and kind of realize that this is something that
can change fairly quickly. Additionally, we
also-- for Discover, we have a Help
Center article that goes into pretty much
detail what kind of things we watch out for, and in
particular, what kind of things we don't want to
show in Discover. So that's something that you
might want to double check. Depending on, I guess,
the site that you have, that's something that might be
more relevant or less relevant there. But I would definitely
check that out. What are the levels of
site quality demotions? Is there a first level
where everything sitewide looks fine, no demotion. Second level, you demote some
pages that are not relevant. Or a third level site
wide is not good at all. So my understanding is, we don't
have these different levels of site-wide demotion
where we would say, we need to demote
everything on the website or we don't need to demote
anything on the website. I think depending
on the website, you might see aspects
like this or might feel like aspects like this. But for the most part,
we do try to look at it as granular as possible. And in some cases, we can't look
at it as granular as possible. So we'll look at kind of like
different chunks of a website. So that's something
where, from our side, it's not so much that we
have different categories and we say like, in this
category or in that category. It's just that there is
almost like a fluid transition across the web. And also when it comes to things
where our algorithms might say, oh, we don't really know how to
trust this, for the most part, it's not a matter of trust is
there or trust is not there. It's like, yes or no. But rather, we have this really
kind of fluid transition where we think, well, we're not
completely sure about this. But it makes sense for these
kind of queries, for example. Or it makes sense for
these kind of queries. So that's something where
there is a lot of room. Let's see. I have a question
about omitted results. We publish two large dotcoms,
horoscope and astrology, with each own URL
and content teams. After ranking on the first
page for astrology queries for multiple years,
in February last year, only one of the sites began
to show up for normal search results at a time. Whichever site has the highest
ranking for a given query will show up, with the other
website being classified as an omitted result. There's no duplicate content or
cross links between the sites, so I'm curious why
this is happening. It's really hard to say without
looking at the specific sites and looking at the
specific situation. So usually, with two websites,
if they're not completely the same, then we would
rank them individually, even if there is kind of like
an ownership relationship there. So from that point of
view, it might also just be something that is
kind of not related to what you're suspecting,
in that our algorithms think that it's like the same
site and we should only show one of these at the same time. I have seen situations
where, if there are a large number of
sites that are involved, a large number of domains, that
our algorithms might say, well, all of these domains are
essentially the same content, and we should just
pick one of these to show rather
than all of these. But usually, if there
are two websites and they're kind of
unique in their own ways, then that's something
where we would try to show them individually. So I think from
a practical point of view, what I would do here is
go to the Webmaster Help forums and post the details what
you're seeing here-- maybe some screenshots, specific
URLs and queries where you're seeing this happening. And the folks there
can take a look at that and maybe guide you
into, I don't know, if there's something
specific that you could be doing
differently there, maybe they can
point you to that. Or maybe they can point you in
the direction of saying, well, it is how it is. That's nothing kind of unnatural
that's happening there. But also, the folks active
in the health forums have the ability to escalate
things to Google teams. So if they think
this is really weird and maybe something weird is
happening on Google's side, then they can escalate
that to someone at Google. Let's see. Does Google Search consider each
URL of a website individually? For example, does a low
score on a domain homepage have any effect on the other
pages which have a high score? So yeah, like I
mentioned before, we try to be as granular as
possible, as fine grained as possible, in the
sense that we try to focus on individual pages. But especially within a
website, you're kind of always linking to the other
pages of your website. So there is kind of a connection
between all of these pages. And if one page is
really bad and we think that's the most important
page for your website, then obviously that will have
an effect on the other pages within your website, because
they're all kind of in context of that one main
page, for example, whereas if one page
on your website is something that we
would consider not so good and it's some random
part of your website, then that's not going to be the
central point where everything evolves around. Then from our point of
view, that's like, well, this one page is not so great. But that's fine. It doesn't really
affect the rest. The mobile section of Core
Web Vitals in Search Console shows a bad URL on original
link while the AMP version of the same URL is a good URL. Why are these two
considered separately? So essentially,
what happens there is that we don't focus so much
on the theoretical aspect of, this is an AMP page and
there's a canonical here. But rather, we focus
on the data that we see from actual users that
go to these pages that navigate to them. So that's something
where you might see an effect of lots of users
are going to your website directly and they're going
to the non-AMP URLs maybe, depending on how you
have your website set up. And in Search, you
have your AMP URLs. Then we probably
will get signals, or enough signals that we
track them individually for both of those versions. So on the one hand,
people going to Search, going to the AMP
versions, and people may be going to your
website directly, going to the non-AMP versions. And in a case like that,
we might see information separately from
those two versions. And then kind of like we
have those two versions and the data there. So we'll show that
in Search Console like that, whereas if you
set up your website in a way that you're consistently always
working with the AMP version, that maybe all mobile
users go to the AMP version of your
website, then that's something where we
can clearly say, well, this is the primary version. We'll focus all of our
signals on that version. The next question there
is, since AMP is enabled, will Google Mobile Search
consider only the AMP version, which passes the Core Web Vitals
test when ranking the website, or will the original
link also be considered? So I mean, on the one hand,
there is the aspect of, is it a valid AMP or not? If it's not a valid AMP,
then we wouldn't show it. So that's one aspect that
goes into play there. But I think in the
theoretical situation that we have data for
the non-AMP version and data for the AMP version,
and we would show the AMP version in the
search results, then we would use the
data for the version that we show in Search as
the basis for the rating. So in a case like
that where we clearly have data for both
of these versions and we would pick one of
those versions to show, then we would use the
data for that version. That's similar, I think, also
with international websites where you have different kind of
URLs for individual countries. And if we have data for
one version, we show that. Or if we have one version that
we would show in the search results and we have
data for that version, that we'll use the
data for that version, even if we have kind of other
data for the other language or other country versions. Yeah. The only case where I know
where we would fold things together is with regards
to the AMP cache, because theoretically
the AMP cache is also located in yet another
place, another set of URLs. But with the AMP
cache, we know kind of how we should fold that
back to the AMP version and track that data there. So that's a little
bit of an exception. But if you have kind of
separate AMP versions and separate mobile
versions on your site, then it's very possible that we
could track those individually. Does Google weigh the
exact match title tag more in comparison to the title
tag focused more on users? So let's say the phrase I
want to rank for is Audi A3. And one version of the title
tag is this exact match. The other version,
is this car for sale? 152 great models. Would this title
will be scored less relevant for the query
Audi A3, just because it is longer and not exact match? I don't think we have any
exact definition on how that would pan out in practice. So there is certainly
an aspect of, does this title kind
of match the query? But we also try to understand
the relevance of the query. We try to understand things
like synonyms or kind of more context around the
query, around the titles, as well. So I don't think
there's a simple kind of like exact match to the query
is better or not exact match to query is better there. So my recommendation there would
be to test this out and just try it out. And not so much in terms of
SEO, which one will rank better, but think about
which one of these would work better in
the search results? And that's something you
could try out on one page, you could try out on multiple
pages that are kind of set up in a similar way. And then based on that,
you can determine, well, this one kind of attracts
more clicks from users. It matches the intent that
the user has better somehow. So I'll stick to that
model and use that across the rest of my website. So that's kind of my
recommendation there. I think with kind of the
general information retrieval point of view, of which one of
these would be the better fit, I imagine you could get
into really long arguments with the people that are
working on information retrieval on which one is better or not. So yeah, I don't think there
is like this one clear answer. Comments below the blog posts-- are comments still
a ranking factor? Migrating to another
CMS and would like to get rid of all comments. About one to three
not really relevant comments below many blog posts. Can I delete them safely
without losing any ranking? I think it's
ultimately up to you. From our point of view,
we do see comments as a part of the content. We do also, in many cases,
recognize that this is actually the comment section,
so we need to treat it slightly differently. But ultimately, if people
are finding your pages based on the
comments there, then if you delete those
comments, then obviously we wouldn't be able to
find your pages based on that. So that's something
where depending on the type of comments
that you have there, the amount of comments
that you have, it can be the case that they
provide significant value to your pages. And it can be a source of
additional kind of information about your pages. But it's not always the case. So that's something
where I think you kind of need to look at
the contents of your pages overall, the queries that
are leading to your pages, and think about which
of these queries might go away if my comments
were not on those pages? And based on that, you
can try to figure out what you need to do there. It's certainly not the
case that we completely ignore all of the
comments on a site, so just blindly going
off and deleting all of your comments in the
hope that nothing will change, I don't think that will happen. When using interstitials
as product pages, does Google index the content
on those interstitials or does it only index the
content on the static pages? So I wasn't quite sure
how you use interstitials as product pages. It seems like a kind
of unique setup. But anyway, I think it's less
a matter of interstitials or not but more a matter
of what content is actually shown when we load those pages. So if we load this HTML page
and by default it always doesn't show any
product information, then we wouldn't have that
product information to actually index, whereas if you
kind of load that page and it takes a
second and then it pops up the full content, the
full product information, then essentially by loading the
page, we have that information. And we can use that to index
and to rank those pages. So it's kind of a simple
way to double check. What we would be able to pick
up with regards to indexing is to take that URL
and use something like the mobile friendly test
or the URL inspection tool in Search Console
and copy it in there and to see if Google is able
to bring up the full product information or not. If Google can bring up
the product information, then probably that's OK,
whereas if Google only shows you kind of the
static page behind that, then probably it's
the case that we wouldn't be able to pick
up the product information. So that's one thing
to watch out for. I think what threw me off
with this question initially is also the word interstitials
there in the sense that usually interstitials
are something that are kind of between the
content that you're looking for and maybe kind of like what is
actually loaded on the browser. So if you go to a
page and instead of the product page it shows
a big interstitial showing something else, that's
kind of the usual setup for interstitials. And from our point
of view, those kind of interstitials,
if they're intrusive interstitials in the sense that
they get in the way of the user actually interacting
with the page, then that would be something
we would consider a negative ranking factor. So if it's really a case that
when you go to your pages it just takes a bit and then
your product page pops up, then I wouldn't call
those interstitials. Maybe use some
other word for that, because if you ask around in
the health forums or elsewhere and you say, oh, my
interstitials and I want to rank for my interstitials,
then probably a lot of people will be confused. And it seems like Google Images
crawls some SVG files in SVG then [? Summit ?]
renders into PNG while serving in
the search results. What is the reason for that? Is there a way that we
can dictate this behavior for a Google Image crawler? I don't know. I wasn't aware of how
this is happening. So I'm not 100% sure what
exactly you're referring to. My understanding is that
when it comes to images, especially the vector formats
like SVG, which don't always have a well-defined size,
what we do internally is we convert that into
a normal pixel image so that we can treat it
the same way as we can treat other kinds of images. That means for all of the
normal processing internally and all of that, and
also specifically with regards to the
thumbnail images that we can show so
that we can scale it down using the normal
pixel scaling functions and get it to the
right size and get it into an equal resolution to the
other thumbnails that we show. So probably that is something
that is happening there. And that's not something
that you can easily change because we kind of
have our system set up to deal with pixel-based images. And that's what
we would do there. With regards to kind
of the next step from there, the
expanding of the image when you click on it in
the Image Search results, I don't know how that would be
handled with regards to SVGs or if we do some kind of
pixel-based bigger preview or SVG based bigger preview. So I don't quite know how
we would handle that there. If you have any examples where
this is causing problems, I would love to see them. So feel free to send me
anything that you run across in that regard, especially
when you see that it is causing weird problems that could be
avoided by doing it slightly differently. Is there anything that
we can do in terms of SEO to improve user journey? I think those are kind
of separate topics. So it's not something
that you would do SEO to improve user journey. But rather, you have
your user journeys that you use to kind
of analyze your product and try to find the
best approaches that you can do there. And then based on
that, you would also try to do some SEO to improve
things in the search results. So one is kind of improving
things for the user. And the other is a kind
of improving things for search engines. Sometimes if things
align well then there is enough overlap
that they work together. But essentially,
they're separate topics. What is the best way to treat
syndicated content on my site if the content is already
in other sites, too? Do I have to no-index my
page or do a canonicalize to the original source? Do I no-follow all internal
links of that page? Yeah, good question. I don't think we have exact
guidelines on syndicated content. Generally, we do recommend using
something like a rel canonical to the original source. I know that's not always
possible in all cases. So sometimes what
can happen is we just recognize that there's
syndicated content on a website and then we essentially try
to rank that appropriately. So if you're syndicating
your content to other sites, then it's theoretically possible
that those other sites also show up in the search results. It's possible that
maybe they even show up above you in
the search results, depending on the situation. So that's something to
kind of keep in mind if you're syndicating content. If you're hosting syndicated
content on your website, then that's kind of similar
to keep in mind in the sense that most of the
time we would try to show the original source. And just because you
have a syndicated version of that content on
your site, as well, doesn't mean we will also show
your website in the search results. So usually what
I recommend there is to make sure that
you have significant, unique, and compelling content
of you own on your website. So if you're using syndicated
content to kind of fill out additional facets of
information for your users, then that's perfectly fine. I wouldn't expect to rank for
those additional kind of facets or filler content
that you have there. It can happen, but it's not
something I would count on. And instead, for the
SEO side of things, for the ranking
side of things, I would really make
sure that you have significant kind of
unique content of your own so that when our systems
look at your website, they don't just see all of this
content that everyone else has, but rather, they see a
lot of additional value that you provide that is not
on the other sites, as well. And when we want to rank for
a specific topic on Google, is it a good practice
to also cover related topics, for
example, if we sell laptops and we want to rank for that,
is it useful to create posts like reviewing laptops,
introducing the best new laptops, those
kind of things? And if it's useful,
then does it have to be done in any special way? So I think this
is always useful, because what you're essentially
doing is, on the one hand, for search engines, you're
kind of building out your reputation of knowledge
on that specific topic area. And for users, as
well, it provides a little bit more context
on kind of like why they should trust you. If they see that you have
all of this knowledge on this general topic
area and you show that and you kind of
present that regularly, then it makes it a lot
easier for them to trust you on something very
specific that you're also providing on your website. So that's something where
I think that that always kind of makes sense. And for search
engines, as well, it's something where, if we can
recognize that this website is really good for this
broader topic area, then if someone is searching for
that broader topic area, we can try to show
that website, as well. We don't have to purely
focus on individual pages, but we'll say, oh,
it looks like you're looking for a new laptop. This website has a lot of
information on various facets around laptops. How long should we wait for a
Search Console manual action response? It's been months in. I avoided resubmitting
because that's not nice. But do these ever get lost? If we don't get any
replies, what should we be doing as a next step? Depending on the type
of manual action, it can take quite a bit of time. So in particular, I think
the link-based manual actions are things that can
take quite a bit of time to be reviewed properly. And it can happen,
in some cases, that it takes a few months. So usually what happens
is, if you resubmit the reconsideration
request, then we will drop the second
reconsideration request, because we think
it's a duplicate. The team internally will
still be able to look at it. And if you have additional
information there, that's perfectly fine. If it's essentially just copy
and paste of the same thing, then I don't think
that changes anything. It's also not the case that you
would have a negative effect from resubmitting a
reconsideration request. So in particular,
if you're not sure that you actually sent the
last one, then you're like, oh, someone on my
team sent it and now I'm not sure if they
actually sent it or not, then resubmitting it
is perfectly fine. It's not that there will be
kind of an additional penalty for resubmitting the
reconsideration request. It's just when the team sees
that one is still pending, they'll focus on that
pending one rather than the additional ones. If you don't see any response
with regards to manual actions, specifically around the
link manual actions, I would recommend
maybe also checking in with the help
forums or checking in with other people
who have worked on kind of link-based
manual actions, because when it takes so long to
kind of be processed like this, it's something where you really
want to make sure that you have everything covered really well. So that's something
where, if you're seeing it taking a
long time, you're like, I don't know if I needed
to do more or needed to do something different,
than going to the help forums is a really good way to
get additional feedback from people. And it's very likely that
you'll go to the help forums and they'll be like, oh, you
should have submitted these 500 other things. And it's not the
case that you have to do whatever feedback comes
back from the help forum. But rather, it's additional
input to take in. And you can review
that and say, OK, I will take into account maybe
a part of this feedback and maybe skip another
part of this feedback, because the folks
in the help forum are very experienced
with tons of topics, but they don't have
the absolute answers. I don't think anyone
really has that. So I think it's great to
get all of this feedback, but you still have to kind
of judge it and weigh it out yourself, as with
anything on the internet. Does Page Speed Insights
use the Googlebot? I wonder because
when I'm looking at the rendered screenshots
in PageSpeed Insights based on our site behavior,
it looks like those weren't rendered by Googlebot. You're probably right. So in particular,
PageSpeed Insights is something which is
based on the Chrome setup. So that's something
where, as far as I know, the server-based system
that does the PageSpeed Insights screenshots
and calculations and kind of metrics,
all of that, is just purely based on Chrome. And Googlebot also uses
Chrome to render pages. But there is some
kind of unique aspects with regards to
Googlebot that don't apply to PageSpeed Insights-- for example, robots.txt. So when Googlebot
renders a page, it has to comply with
the robots.txt of all of the embedded content there. And if you have maybe a CSS
file or JavaScript file blocked by robots.txt, we wouldn't
be able to process that from the Googlebot
point of view. But PageSpeed
Insights would still be able to review
that and show that. So that's probably where you're
seeing those differences. I think the difference
is more and more blurred because Googlebot does
use Chrome, as well. So it's very similar. But you can certainly
find situations where there are differences. And you can certainly
construct situations where there are differences. Like I mentioned,
with robots.txt, it's a really simple way to
kind of see those differences. With regards to the way
that we calculate speed for Search, though, we-- so kind of, I guess, looking
forward at the Core Web Vitals, at the moment, I don't really
know offhand how we do that. But with regard to
Core Web Vitals, we use what users actually see. So it's not the case that
Googlebot renders a page very quickly and then it
gets a good score, or Chrome, in
PageSpeed Insights, renders a page very quickly,
therefore it gets a good score. But rather, we look at
what users actually saw. And I think that's
really important, because that's kind of a
measure of what the real world performance is. And all of these tools
that kind of render it in more of a lab environment,
like Googlebot when it renders a page, or PageSpeed Insights
when it renders a page, is something that is
almost more of a prediction than an actual kind
of measurement, because there are lots of
assumptions in play there. And whenever you run
something like a rendering of a page within a
data center, then you have a very different setup
than the average user has with regards to network connectivity,
with regards to caches and all of that. It's just very different. So these tools,
when you run them and when you look at the
measurements that they show, you need to keep
in mind, this is more of a prediction
rather than an actual value that the users will see. And-- [INTERPOSING VOICES] MICHAEL LEWITTES: I'm sorry. No, I was going to-- no, I
was going to ask a follow up, but I'm sorry. You continue. JOHN MUELLER: And it's
something that these tools also try to build in, the sense
that they will say, well, I run in a data center, but I
will act like I have a 3G phone and kind of a slow connection. And they'll try to emulate
that, but it's still very different
than actual users. Go ahead. MICHAEL LEWITTES: Yeah,
I'm sorry about that. So in terms of assessing
it, I was reading a site, seemed like it had a lot of ads. So I decided, OK, let
me see how this is scored on PageSpeed Insights. And it rendered a score
with the circle of 21, which was in the red, not very good. But below that was
a visual, and below that numerical and
visual representation was a sentence that read
that based on field data, the page passed in
green the assessment. And then below, there were
these measurement bars for cumulative layout shift,
first input delay, et cetera, and those were all
mostly in the green. So where is the disconnect
and what should one be paying attention to,
that first visual circle, or the fact that it says it
passed the Core Web Vitals assessment? JOHN MUELLER: I need
to keep in mind how the PageSpeed Insights looked. I think it has that one
overview score on top, right? MICHAEL LEWITTES:
Yeah, I actually-- is there a way to present,
because I did a screenshot and I redacted the
name of the website, if that makes it easier. JOHN MUELLER: So I
think what happens in PageSpeed Insights is we
take the various metrics there and we try to calculate one
single number out of that. And sometimes that's
useful to work on or to give you a rough overview
of what the overall score would be. But it all depends
on how strongly you weigh the individual factors. So it can certainly be
the case that overall, when the users see a page,
it's pretty fast and sleek. But when our systems
test it, they're like, oh, these are some
theoretical problems that could be causing issues. And they'll kind of calculate
that into the score. So I think the overall
score is a really good way to get a rough estimate. And the actual field
data is a really good way to see what people actually see. And usually, what I
recommend is using those as a basis to
determine, like, should I be focusing on improving
the speed of the page or not? And then use kind of
the lab testing tools out there for kind
of determining the individual values and for
tweaking them with the work that you're doing. So kind of using
the overall score and using the field data
as a way to determine, like, should I be doing
something on this or not? And then using the lab data
with the individual tools to improve things
and check that you're going the right direction,
because the issue is also the field data is delayed,
I think, by about 30 days. So any changes that you make-- and if you're waiting for
the field data to update, it's always 30 days behind. And if you're unsure that you're
going in the right direction or that you've improved things
enough, then waiting 30 days is kind of annoying. MICHAEL LEWITTES: Thank you. JOY F: Hey, John. Can I add a follow
up on that, as well? JOHN MUELLER: Sure. JOY F: With regards
to Core Vitals, a field data is
going to be the one to pay attention to, correct,
in terms of ranking signals? Or is it going to be-- [INTERPOSING VOICES] JOHN MUELLER: Yes. It's the field data. THIAGO POJDA: [INAUDIBLE]. While we are in this
Core Web Vital topic, I have a small question in
this regard is that when this becomes a ranking [INAUDIBLE]-- CLS and all the
other [INAUDIBLE] is it going to be page
level or domain level? JOHN MUELLER: Good question. So essentially what
happens with the field data is we don't have data
points for every page. So for the most part, we need
to have kind of groupings of individual pages. And depending on the amount
of data that we have, that can be a grouping of
the whole website, kind of the domain, or I think in the
Chrome User Experience Report they use the origin, which
would be the subdomain and the protocol there. So that would be kind of the
overarching kind of grouping. And if we have more data
for individual parts of the website, then
we'll try to use that. And I believe that's
something you also see in Search Console
where we'll show one URL and say there's so
many other pages that are associated with that. And that's kind of the grouping
that we would use there. THIAGO POJDA: Just
why I ask this-- we have this set of
pages that they are slow. They exist for a different
purpose than our other pages on the site. And these we have
a noindex on them. But they are very slow. And that's why we don't
want it to be accounted for. JOHN MUELLER: Yeah. I don't think-- or I don't know
for sure how we would do things with a noindex there. But it's not something you can
easily determine ahead of time. Like, will we see this
as one website or will we see it as different
groupings there? Sometimes with the Chrome
User Experience Report data, you can see, does Google have
data points for those noindex pages? Does Google have data points
for the other pages there? And then you can kind
of figure out, like, OK, it can recognize that there
is separate kinds of pages and can treat them individually. And if that's the case, then I
don't see a problem with that. If it's a smaller
website where we just don't have a lot of
signals for the website, then those noindex pages
could be playing a role there, as well. So I'm not 100% sure,
but my understanding is that in the Chrome User
Experience Report data, we do include all kinds of
pages that users access. So there's no specific
kind of, will this page be indexed like this or not
check that happens there, because the indexability
is sometimes quite complex with regards to
canonicals and all of that. So it's not trivial to
determine on the Chrome side if this page will
be indexed or not. It might be the case that if a
page has a clear noindex, then even in Chrome we would
be able to recognize that. But I'm not 100% sure
if we actually do that. THIAGO POJDA: All
right, thank you. I'll follow up on Twitter. JOHN MUELLER: Yeah. I would also check the Chrome
User Experience Report data. I think you can download
data into BigQuery and you can play with
that a little bit and figure out how is that
happening for other sites, for similar sites
that kind of fall in the same category as the
site that you're working on. Cool. More questions from any of you? CHRISTIAN FEICHTNER: Yes, John. Hi. I suddenly see--
well, it started all at the middle of January. I suddenly saw in Search
Console that there are a lot of old URLs
popping up, especially in the 404 subcategory under
Excluded and in the URL Inspection tool. These old URLs are, for example,
old HTTP versions of URLs. And it's even old domains
because the websites were moved to a new domain,
like, three years ago. So my question is, why is that? Should I be worried? And if yes, how can I fix it? JOHN MUELLER: So these are
showing up as 404 errors or-- CHRISTIAN FEICHTNER: These are
showing up as a 404 errors. And for some URLs, if I use
the URL inspection tool, they also show up as referrers
in the URL Inspection tool. JOHN MUELLER: OK. I think if they're
just shown as 404s, I would completely ignore that. What happens in our systems
is that pages which are 404 are essentially still
tracked on our side. And from time to
time, we will double check to see that
they still have a 404. And that can happen,
like that a site is-- has changed
significantly, doesn't have these pages for years now. And still from time
to time, our systems say, well, we will double
check those old URLs and see if they
still return 404. And that's not a
sign that anything is stuck with those pages. It's just kind of
our systems trying to make sure that we're
not missing anything from your website. CHRISTIAN FEICHTNER:
And if they show up as referring URLs in
the URL Inspection tool? JOHN MUELLER: So how do
you mean as referring URLs? Like, that they link
to another page, or-- CHRISTIAN FEICHTNER: Yes. For example, I used
the URL Inspection tool on a URL that's still present. And then in the
URL inspection tool you see where Google
loads this page from. And there it says, for example,
it knows it from the sitemap. And then there are, like,
four URLs listed below that. And in that list, this
list contains, for example, an old HTTP version. It contains the same file
name but from the old URL of the website, which are all
URLs that don't exist anymore. So this is also something
that makes me worry, or shouldn't it? JOHN MUELLER: That's
completely normal. Yeah. That's something where I am not
100% sure which data we show there in Search Console. But we have a concept of the
first seen location of a link to a specific page. And we might have seen that URL
from that page at some point way in the past. And if that page
doesn't exist anymore, it's still like, this is
where we first saw it. CHRISTIAN FEICHTNER: OK. So it's basically just make
sure that if the original page doesn't exist anymore,
it returns a proper 404. If it's redirected, then make
sure it's a proper redirect. And in other cases, just
ignore it [INAUDIBLE] JOHN MUELLER: Exactly. Yeah. So usually if you have an older
website, then over the years, you will collect more and
more of these 404 pages. And our system is-- even when
they rarely check a 404 page, it's just like the
amount of URLs that could be returning 404 grows. So if you look at your
service statistics and you look at what
Googlebot is requesting, then it can look
like, oh, Google is spending so
much time on 404s. But for us it's just checking
it maybe once a year or so. But because we have so many
that we check once a year, it's overall looks like a lot. CHRISTIAN FEICHTNER: Sorry, this
website is, like, 10 years old. And last question with
that, because the websites I'm talking about will
also move to new domains. We used the address change tool. And so basically, just make
sure that the old domain still redirects to the new website. This would be the
perfect, good setup, and we shouldn't worry
about anything further. JOHN MUELLER: Yeah. That sounds great. The one place where
people also get confused with that, which is kind
of similar, I guess, with old URLs, is
that when we recognize that pages have moved, we
still have some association of the old location. So we will know that this
page on the new website used to be located as a page on
the old website, in some sense. So if in the search
results you do a site query for the
old domain, then even after a couple of
years, you'll still see a lot of URLs
that are shown there. And it's not the case that
we have them indexed there, but rather, we know
they used to be there. And it looks a
user is explicitly looking for the old location. So we'll show them. So if you look at
the cached version of the page in a case
like that, then you'll see that actually
chose a new domain. So it's a little bit confusing
if you look at it like that. But essentially, it should
be working properly. CHRISTIAN FEICHTNER:
OK, Thank you. JOHN MUELLER: Sure. BILAL AHMED: So I have
a related question. For example, if it said
proper 301's redirect. And I was just trying to
understand the relation of pay it back links only
the ones-- if Google has the history
of the old links, is it possible that it
passes some sort of page rank to new URLs that we
set 301 redirects for? For example, we have a
site with the back links. And we decided to change the
URL with the proper 301s. And back end [INAUDIBLE]
always there we normally change a few of those. But they are still there. So if you say, Google has
some sort of history there, so would it be possible
that it passes some sort of rank [INAUDIBLE]
or page rank [INAUDIBLE] to [INAUDIBLE]? JOHN MUELLER: Yes. So essentially what
happens there is we will have kind of the
old URL on your website that has some signals from the
links that go to the old URL. And we have the new
URL on your website. And with a redirect, you're
basically telling us, these are equivalent
and you probably prefer the new URL to be shown. So what we will do is we
will put both of those URLs from your website
into a group and say, this is kind of a
group of URLs that have kind of collected signals. And then with the
redirect we will pick usually the
destination URL and say, this is the canonical
for that group. And the canonical
page will then kind of inherit all of the signals
that go to that group. So if there are links to
the old version of a page, if there are links to a copy
of that page, then all of that will be kind of
combined together in the canonical version. So that's something that
kind of gets passed on there. Specifically when you're
talking about site news, we still recommend making sure
that you, as much as possible, can update the old links anyway,
because what happens there is, we will put those URLs
in the same group, kind of like I mentioned. But we use various factors that
determine which of these URLs is the right one to show,
which one is the canonical one. And redirect is one factor, but
also links are another factor. So if all of the links, like
internal and external links go to the old
version of your URL and you redirect
to a new version, we might pick the old version
of the URL to show in Search. So that's something
kind of to keep in mind that if you want to move
everything to a new URL, then make sure that everything
is aligned with the new URL-- so the redirect,
the sitemap files, the internal linking,
as much as possible also the external linking
so that everything just fits together with that new
kind of URL that you want. BILAL AHMED: Thanks
for this question. I have another question. Most of the tools, the
SEO check-up tools, they pop up with
a warning, which says low text to HTML
ratio, which means code is more than written text. Would it be something that
we need to worry about? Or is it OK for Google
to pick the right text? JOHN MUELLER: We don't
have a notion of text to HTML ratio for Search. So that's something where I
think a lot of these tools are able to calculate this. And they think, oh,
it's worthwhile showing. But it's not an SEO ranking
factor kind of thing. There are two places where it
could play a role, on the one hand, with regards to speed. So if you have a lot of HTML and
you have very little text, then obviously we have to load a lot
of content to display the page. So that's one small factor. The other one is with regards
to extreme situations where you have a lot and a lot of
HTML and very little text. We have limits with regards
to what the maximum page size is that we would
download for an HTML page. And I think that's in the
order of, I don't know, hundreds of megabytes,
something like that. So if you have an HTML
page that has hundreds of megabytes of HTML and
very little text in it, then yes, that could
be playing a role. But that's something that I
suspect is extremely rare. And if you have
that problem, then that's a bigger
problem than just like, oh, it's not perfect
HTML to text ratio. BILAL AHMED: Perfect. I got that. Thank you. JOHN MUELLER: Sure. Let me just pause
the recording here. You're welcome to stick around
a little bit longer if you like. But it is always
good to kind of keep the recording limited to
avoid it becoming super long. Thank you all for joining. Thanks for all of the
questions that were submitted and that were asked
from you along the way. And I'll set up the next office
hours probably later today, which will also be next Friday,
but evening European time, more for the American
folks so Michael doesn't have to get up in
the middle of the night. I don't know how he
does it, but thank you. Cool. All right. Let me just pause here. And we can continue after that.