Martin Splitt — Technical SEO 101 for web developers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I guess you've been warmed up but some premiered cartola see what a shore can yes now and that's it with Russian I can say like a few more words I'm basically the timer I'm like Yahoo Mirage Ania yeah hold on now so like I basically am a very whiny kid when it comes to my Russian vocabulary I don't know today I like to talk a little bit about SEO and more specifically technical SEO but I think before we can dive into that I would like to give you my definition of SEO what it is and what how it works and you know how it concerns us web developers and then once we have figured out what SEO should be doing or could be doing we want to go technical we want to figure out how do my web pages actually get into search engines now that differs from search engine to search engine but a few very broad concepts are pretty much the same across all search engines I can't speak for other search engines obviously I don't know how they do it all but I know how Googlebot does it so we will talk about Googlebot as an example of a more just generic search engine then once we figure out how our pages are getting into search we can also look into things that can possibly go wrong and how we can deal with them how we identify them how we deal with them and then once we have a foundation to start from I want you to send to take away a bunch of pointers where you can learn more and and basically keep up to date as well right enough blah blah let's start with the meat of this talk what is SEO search engine optimization okay cool but what does that mean who here has worked with an SEO you what do you think does an SEO do that's the Bunz I thought I'm very sorry for making you uncomfortable you get some chocolate there you go oh I'm so sorry Oh God actually I'm sorry I hurt you you get some chocolate too there you go oh my can I just it worked earlier it worked five minutes it worked on my machine anyway so I it's not necessarily clear what se owes you always do and it's it's a it's a lot that they do actually but we oftentimes see bad examples of SEO you know who here has a blog or a website a personal website and they get these keep your hands up if you get these emails like hi would you be interested in a grass post doing thingamajiggy or would you be interested in us sharing your link with other bloggers so that you get more SEO that's but that's what we perceive as SEO and that's unfortunate that we perceive that as SEO and we you might know these link farms right these pages that are full of links and no content that's also a bad example of SEO it's not good SEO it's not SEO and indeed actually because we are filtering this kind of stuff out so what is SEO then well it depends on who you ask but my personal definition of SEO starts with content an SEO should work with the marketing and sales people to figure out who are our customers who should be coming to our website and what are they looking for what are they trying to accomplish and how does that fit with what we what we want to do right we want to sell a product or we want to promote an event or something like that then we need to figure out what we what we want in terms of content what is it that people want from us that we can deliver on our website and then they also should talk to product owners and business stakeholders and figure out like what's the broader strategy here is it what what are we looking for are we looking for subscriptions are we looking for recurring readers are we looking for people buying products what is it that we try to do is a very strategic thing now cool that's great Martin but how what do I care as a web developer this is marketing and sales and business stakeholders wait there's a third thing there's a third important pillar and that's technology I think a good SEO should also be able to understand what happens on your pages and what technology decisions are being made or have been made that led to the situation that you're in they should be testing and monitoring your site in terms of how they are performing in search engines and then give you solid and useful feedback and help you make the right decisions and make the right investments right who here is interested in web performance who here is happy how their their business does web performance who here gets priority to optimize web performance see that's the problem and I think because the SEO is work and understand the strategy and the content side of things and they work with these other stakeholders how nice would it be for us web developers if we are like our website is really slow we need to fix that and the SEO goes yes and here is the financial reason why and here's the business reason why and this is why we are not reaching our business goals so they can be our allies and we can be theirs so I think this is fantastic because that means that we can and should work together with SEO s as well as they should work together with us they don't have to be developers don't worry you can do that I trust all of you to be fantastic web developers however they can help you by you know monitoring testing and giving you feedback on things that they see how about that you deploy something how nice would it be to know that you're not going to see like your search engine results dropping like you are dropping out of the search engine and you're not getting any more more people on your website and that's a problem this can be prevented by an SEO helping you test the thing before it goes live and then you can fix it maybe even if they are in the planning session they could have said well wait a minute there is this thing that I heard and I saw let me find you the article or the video or the blog post that helps you work around this problem so good SEO support us and we can support them and this talk is pretty much about the technology side like the other things whatever but let me just say a thing before we go into the technical of that side I think that SEO on the technical side without looking at content and strategy is just chocolate ice cream emoji a tag of no you know it's it's kind of pointless to do that you're polishing a turd if you have terrible content none of the optimizations on the technical side will go that is going to help you if your strategy doesn't make sense for your business none of the technical optimizations will help you and let me give you an example let's say my toaster broke I am a German I do have breakfast and like proper breakfast and unlike a proper German I'm probably also very much into like toasts and you know that's like we have good bread cool but we also like to toast stuff now my toaster broke so I need a new toaster so I go to Google and search for new toasters now I imagine I come across this website what is this well it's about hot bread whatever that is and the most prominent thing on this website is this header here that says smart simple beautiful which means nothing cool so I still don't know what this is the image looks more less like a toaster maybe it's a toaster core I don't know so I read the rest it will disrupt your breakfast I don't want my breakfast disrupted I want it peaceful and quiet and tasty and toasted so no and then it's like thermochemical food processing and I'm like what is what is that is it is it cooking is it frying is it toasting what and then it's like the best invention since sliced bread is like haha yeah still I have no idea what this is so maybe I look at the other things that are on this website and then I see things like our philosophy I don't want a philosophy lesson I want to toast her hot bread extent I don't even know what the hot bread thing is why would I click on hot bath x10 and then I can join the move I don't want to join a cult I won't toast what the heck who thought this was reasonable content and now look at the same website with a slight change I need a toaster most prominent thing says the fastest toaster yes this is probably a good page I'm on to something here never burn your toast again yes hell yes how often did my old toaster burn my toast when I wasn't in the kitchen because it doesn't come up and then it ah it was a mess and it made me angry and it disrupted my breakfast you know the thing that I didn't want that happened and it gets me a toast faster which is brilliant because I'm usually getting up a little later than I should you know that's kind of convenient if that thing actually toasts the thing quicker and try to toast in a toaster phase so toast toasty make toaster face is probably the thing that does all these things cool but maybe I'm very committed you know so maybe I'm not ready to make the commitment to buy this specific toaster so how do I even know which toasters the right toaster for me it's gonna be a good partner in my breakfasts from now on I don't know but conveniently they have how to choose a toaster section so I can figure out how I should choose my toaster what critic criteria do I look at and conveniently they don't only have this one they also have other toasters so I can compare and once I know that I want to make a commitment I can just buy a toaster similar website probably the same technical foundation very different content very very different results this website helps me buy a new toaster the other one join the movement I don't know what that is but I don't want any of that Germans joining movements is not a good thing I think anyway so let's talk about technology and conveniently we neither need a helmet nor safety glasses nor safety gloves so we can just like figure out how we can do things right and to do that we could start with figuring out how a page actually gets into a search engine as I said I can't speak for all search engines but I know how Google search works so we'll look at how Google search does it and we use this thing called a Google bot and it starts with a list of URLs we call that the crock you so we have a list of URLs and we take a URL off of that list and then basically make an HTTP request to get the HTML that's what we call crawling that's the very first step of the process and we basically start by making this HTTP request grabbing the HTML and then we process that and three things happen in processing the first thing is we look through the HTML to find any links with URLs because we can put these right back into the queue so that the next free crawler can actually make the same process happen there so that's like an optimization that we do already once I'm sorry once we I've done this optimization the next thing is we look at the content what is this page about is this about toasting my toaster phase the fastest toaster cool so we can put it in the index for toasters and maybe toast in a toaster phase and how to choose a toaster something like that so we put these things into the index so that someone looking for toasters can potentially find a list of URLs from our index that contain things about toasters it's a fast index mechanism cool that's very nice but there is a thing here remember that I say we make an HTTP request and get the HTML that is literally just what you would do with curl right so like you just grab we don't use curl but but basically that's what happens we get the HTML but in a single page application like react angular view whatever we do not really have that much content to put in the index we just load our JavaScript whoops we know that so that's why we put the HTML into a queue as well we put the HTML into the render queue because we have to you know render your page now the web is big Google knows more of Google search knows more than 130 trillion pages so obviously we can't open 130 trillion browser tabs in one go so we have to stagger that out a little bit and we do that by this queuing mechanism and once we have the resources available a service starts that's called the web rendering service the rep rendering service is basically a headless chromium that actually opens your page and actually renders it and executes all the JavaScript in the process once it has done that we get new HTML that we feed in to process it oops sorry we have we get new HTML that we feed into processing and then we can look for links again and then we can also see the content that was generated by JavaScript and put that in the index once things are in the index if I search for toasters ranking happens ranking means we look at everything for this topic in the index and sort it and we use so many factors to do that there's so much stuff in there that is really really hard as a user to figure out why something was ranking higher than other things but basically we try to figure out what is the best result for this specific user or for this this situation generally for this particular query that I'm asking because it's a difference if I'm asking for cheapest toaster or fastest toaster or toaster comparison or buying a toaster there's like different things different sites might be better than others for these different queries we have to rank them somehow ranking is a completely different can of worms I'm not going to open that so the entire rest of the talk will focus on the first three stages crawling rendering and indexing we're not going to talk about ranking today so I'm not going to talk about ranking factors or any of the magic that happens in ranking very different topic all right cool so now we have discussed this and we know that we can do cool stuff and we actually announced at Google i/o that Google part is now evergreen what does that mean we have recently updated Google bot to run chrome 74 which is the current stable chrome but we also made sure that we will be always up to date within a couple of weeks of a stable chrome release so no longer worrying like what kind of version does Googlebot run we are continuously testing these new versions before rolling them out on small sets of URLs to make sure that our indexing infrastructure works properly but if you want to learn more check out the blog post I wrote on that there's also the video from the i/o session where we go into much more detail on these things cool so Googlebot is evergreen which means we can assume that if it works in our browser and probably also works in Googlebot that's good news right so this your this HTML will be executed with JavaScript will render its content we can index this content great but as you know as developers we make mistakes and sometimes these mistakes are tiny mistakes with big consequences right someone I think they were like a few things where someone made like a typo somewhere and then an entire financial trading company went bankrupt these things happen and they can happen with our websites as well right so you made a ton it may make a small mistake and then you have an impact that you would not want to see if your SEO s or you are monitoring your pages using the tools that we provide like search console or mobile-friendly test you have a good chance of catching these but you know they'll just give you an example this is me flying a glider I learned how do it but here I make a tiny mistake tiny mistake and boom right that can happen to you and that can happen with Googlebot as well we can't protect you from everything mistakes happen in this case you notice that I hold my arms like this I should have helped them like this then I would have been flying this way I was flying unto my face good job mine so how do you spot these things and how do you deal with them let's say you look at your server logs and discover hmm Googlebot because Google BOTS tells you in the user agent string that is Googlebot Googlebot is not actually going to some of my pages some of my products are not there some of my new stories are not there some of my images are not there whatever some stuffs missing or you search on your site like you search on Google and see that some bits and pieces of your site are actually not present well then the first question I usually ask when that happens is how do you do links and I can't believe that in 2019 I have to talk about this but here we are people are making this mistake still so I'll be talking about links with you this is the link good linked linked element it has an H ref and that href hasn't it has a URL that I can go to cool now if this is a signal page application it probably intercepts the navigation it stops the page from refreshing and basically uses Ajax like xhr or fetch to get the data from the backend that's cool that's that's fine because still this has a URL that I can go to all cool this is not so good Googlebot and other crawlers don't click on your stuff we look for a URL this doesn't have a URL and then you might be like oh I'm smart I'm gonna play you I give you a URL to go to no that doesn't really go anywhere not good oh but this is a URL right yes but no right this still does not take us anywhere so this is still not very good and why look the rule of thumb is if it takes your user to different content it's a link if it does something else on click cool then it's a button good well-spotted but if it takes me somewhere else again Googlebot and other crawlers don't click on your stuff so no not a good link and then like I know No Oh chin blow ha don't do these things near a Buddha it's just use the first two you'll be fine the others mm-hmm okay that's the point that I want you to take away but I can't believe that I have to talk about this but all right so now you have it google says don't do that it's just so ah okay anyway also mm-hmm when I say these links go to URLs look at your URLs quite carefully so here we have two examples one has a host and a path call and then the other one has a host path and a section fragment identifier if you have a very long page I'd say the Wikipedia article about Jam I don't even know if that exists or if or if it's long for that matter but let's say like you have a very long article about Jam the history of Jam for very popular flavors different companies manufacturing it the jam wars of 2015 I don't know something like that and you can use fragments to identify which section you're interested in but crawlers don't care because if I search for history of jam I want to go to this page I want to go to the same page with I search for popular jam flavors so we ignore these things and that has been fine until single page applications were initially introduced because the only way for us to know that that content needs to change without anything triggering a navigation and us getting a JavaScript event was actually the history sorry the the hash change event so what we did is which we had links to different hashes and these would then basically just cause or an event that we can listen to and then we can change the content using Ajax cor great but that's a hack originally this was never intended to work that way so we kind of like exploited an implementation detail rather than the standard and since then we have the ace 2 API the history API gives us a possibility to have clean URLs that just work and when I say just work they just work down to Internet Explorer if you are using Opera Mini on a very like low-end device or if you're on a I mean what kind of computer runs IE 10 or IE 8 they're not going to be very fast so I guess a page refresh that you know refreshes the page is better than waiting for this very slow computer to execute the JavaScript to make the change I just want the content and this just works fine just use history API most frameworks do that by default others let you switch to it do switch to it do not use or rely on fragment URLs and then you might use something like search console and you might look into your coverage report which tells you which pages are in in search and Google search index and you might be like yeah it was crawled that's cool I mean I'm sorry it was discovered so the links and the fragment URLs are not my problem and was discovered but it's not indexed at this point what does that mean it means that you are in the crawl queue we haven't crawled this page yet that might persist for a while it might really be quickly gone but it might persist for a while why is that that's because of crawl budget well what is crawl budget Maarten well crawl budget persists consists of two things first is the crawl rate how quickly can we crawl your pages the second is how often should we crawl your pages and recrawl your pages that's called crawl demand now let's look at crawl rate for a second crawl rate is the symptom of the issue or the challenge that we have to master that is we want to crawl your pages as quickly as possible without hurting your users or your servers right so just imagine you have a million products and we discover a million URLs and we're like yes this is cool well it's like grab them all and that might actually be fine but what if your web server is Raspberry Pi at someone's desk you shouldn't do that necessarily but you know we would probably bring down your web server so what we do instead is we crawl with a certain amount and look at things like the response time and the response code here your service says everything's good 299 milliseconds fast response and we can we are going with like 300 requests per second so maybe we can up it a little bit ah the server gets lower maybe this is not a good idea we might actually step back but let's say we would gamble and have likes try it out an even higher amount of crawling and your server goes no screw this like what the hell and we're like oh we are sorry we'll like crawl a little less don't worry about it sorry for that whoopsies so we look at a bunch of things and try to like figure out how much we can crawl you can also use search console to tell us not to crawl too much you can say like oh there's Black Friday sales everyone's out on our website please only crawl a hundred pages a second or something like that and then you can tell us not to go over this that works as well but this has nothing nothing to do with the quality of your website don't worry about it it is well not a low crawl but it does not hurt your ranking does not hurt your indexing it just means that we are not gonna update content as quickly in our index but that's all so it doesn't necessarily matter also crawl demand we look at things like is this page actually popular is this like an internet login page or is this actually like something that lots of people are linking to organically if you buy links that doesn't count sorry so we look at organic links we look at things like how often does this show up and search how well does it rank usually and then we figure out so this is actually a pretty good page we should probably come back but does that or is that always true well know a Wikipedia article might not change that often might change every couple of days or maybe even a couple of every couple of weeks so we also look how fresh is this page right when was this last updated how often does this usually update we we keep track of that like oh this this page wasn't updated in five years but now it updated and then it updated again a week later maybe we should like every couple of weeks we should check and then if that doesn't happen again then we like back off a little bit again this has nothing to do or nothing to say about quality or if your page is ranking well it doesn't matter what matters here really truly is that we are trying to not kill your page and we're trying to make the right decisions and how often we recrawl your stuff a news page with content that changes every couple of weeks we might crawl a little more often than a page about the history of jam the history of Jam is not that exciting I believe maybe I'm wrong change my mind and also what's interment to understand with kua budget is it's not just the pages that we crawl it's also the resources attached to them so if you have a page with ten images and the style sheet we fetch your page we fetched the style sheet and we fetch the ten images right the same goes for Ajax requests and Ajax requests counts towards your crawl budget because still we are calling a server somewhere so hmm however we do caching for get requests so let's say you have two pages the same CSS but different images one image each we go to the first page we crawl or fetch the stylesheet we fetch the image we go to the second page we use the cache for the style sheet because we've got it already and it hasn't been expired since and then we do the other image so we the counting of resources is important but we can only cache get requests if you make post requests from your API sand with a xhr we're not going to catch those we're also going to crawl content that you might not want in search and we might also call things that are actually duplicated content or the same page a reachable via multiple URLs I'll talk about that in a second another thing to keep in mind our URL parameters if you're using URL parameters instead of paths that's perfectly fine but if you have URL parameters that show up in links that do not change the content of the website you want to tell us that and I'll explain to you in a second why also soft errors if you have an error condition you should tell us that in the HTTP status code because we are considering that all right if you want to learn more about crawl budget in general or how to optimize it we have a fantastic blog post and another blog post linked from that and where you can learn more but let's talk about duplicated content for a second sorry did you do you want to take a picture of this because I saw some phones going up anyway one two three four gone so let's say you have a doc rating website you have pictures of dogs and people can like rank them or rate them or comment on them or something you have a top dog of the day or of the week and then you have the individual dog pages today Leica is the top dog so / top - dog actually shows the like us like Us page but also slash dogs that like our shows the page of any day what could you do to help us well what you could do is you know we are crawling both pages and we're throwing one away so you could tell us you know what this is actually the same as this other page and then if we crawled the top dog first if we see like the canonical is this and we're like alright okay we don't crawl the other one because we know that it's the same thing so cool thank you very much you might save a request that's pretty nice is if you have lots of pages that might be specifically nice so you can tell us what you think is the canonical is the URL that we should be using for this content but we might not always use it so thanks for the hint but sometimes you figure out that that's not exactly what we need or what we want and then we pick a different canonical that's not a problem unless we pick a completely wrong canonical then you can totally tell us or using the public support channels that we've got but normally it's things like the German website of a retailer and the Swiss website of a retailer both written in German have the same content but one is f dot CH bonus a dot d e and then sometimes the the businesses tell us like the CH is the canonical and the de is the canonical but there's the same content so we're like they are the same thing we just have different entry points here so for people searching in Switzerland we might still show the CH domain but we will say the canonical is the German for instance because the market is larger and there's probably more links and signals we get from that so don't worry about that but it's good to give us this hint also if you have content that you don't want to end up in search let's say like I have a really stupid high school photo on my website on a / private / Martin Schmidt high school photo dot JPEG I don't want that to be crawled so I can use a robots.txt file on my server to tell us not to crawl this and that's great I can say like do not put this do not crawl this but if someone else links to it and we index and crawl this page then we see like Martin it's a link Martin plays high school photo to this particular URL and we're like okay we put that into the crawl queue and then before we crawl it we're like oh we can't crawl this but we have the information that this is Martin splits high school photo so we might still put it in the index because it might still be useful but we give you a way to not have that happen to you you can use an HTTP status code a header sorry it's not status code an HTTP header at the X robots tag header to tell us not index this and you can also put it in an HTML page you can tell us not Index this page the HTML tag and the HD HTTP header is a pretty tricky one if your robot in this then we're never gonna see them so we want to be careful with your robots just generally be careful with your robots some people think that they can use it to optimize their crawl budget and then they get too excited so this page should have cats but if you oops if you look at this there are no cats here it's just an empty blank page what happened well if I check it out I see that that makes an API request because it's a client-side rendered app it makes this API request and it doesn't find sorry it finds the robots.txt and the robots you see goes no API requests for you good sir so we can't actually fetch the data from the API be careful with your robots.txt this tool by the way is called the mobile-friendly test it's fantastic it doesn't only tell you if your page is mobile-friendly it also gives you page loading information and it gives you JavaScript error so that's kind of cool now let's talk about your L parameters so this is a specific cat URL and we can like crawl that that's perfectly fine not a problem but what about those two same cats but there's like something that adds a timestamp and then there's something that adds the session identifier that might be like someone posted this from their own Europe while they were logged in on the side or something and now we find this URL and we're gonna crawl it what you can do to help us not crawl this unnecessarily is you can tell us that this is just the same thing and you redirect to the page without this URL parameter and then we'll figure eventually we will figure out or this URL parameter doesn't matter and you'll be fine now let's talk about soft arrows soft arrows are a tricky one they can happen with with a bunch of different things and they can also lead to what we call infinite crawl spaces so I go to URL that doesn't exist but it's a single page application so the server just serves the index.html and the JavaScript and goes like yeah whatever here you go and then I see an error message this is fine right well no because the service said this page is fine sooo Auto sure bye bye and and that doesn't make that doesn't make us happy right Googlebot is not exactly excited about this so yeah and now imagine you have like a page that has a bug in the in the pagination you have a link to the next page and that just increases the counter and it goes infinitely wide but your server goes yeah this is ok and then shows an empty page so we would start with like page 1 good see the link to page 2 good see the link to page 3 good and now we don't have any anymore cats but we see the link to plate 4 and your server says good we see the link to page 5 yourself says good not good that's not a good idea but how do you solve this in a single page application are you screwed now no you're not the first choice actually it leads to like stuff like this as well we have error detection for this we try to like catch this before it happens but sometimes it does get through sometimes we don't catch it and then this ends up happening the one one way to fix that is by redirecting us if you know from your url so this tries to fetch the dog the dog doesn't exist okay that is fine not a problem we'll just like redirect to a page that we know the server responds with the 404 at so slash not found will give us a 404 and we're like ok so this is a redirect of this other page so we are crawling this less or if we figure out after a couple of years and all the links disappear to this then we'll stop actually crawling this you can also use a meta robots tag so here we have a meta robots tag somewhere and it says all just like go ahead have fun and once we figure out that this page does not exist we change it to say do not Index this and now you might wonder ha that's an interesting way of doing it can't I just do differently like have it know indexed from the beginning and then set it to index once I know that this exists that that might be a nice solution right it's not why is it not it's because of our pipeline remember we take the URL we get the HTML that now has no index in it processing goes oh this is no index so I can just move on with the next URL because this doesn't want to be in the index hmm so that means our JavaScript never ran so it never had the chance to remove the no index so I have removed all pages from the index now good job other things that can go wrong is when you do things like this let's say you have a home page with the gdpr warning and the cookie policy and privacy policy and what not and you are you're basically having the user click agree and then you set a cookie that the user agreed and then when the user goes to this news story on your website or this blog post then you check if this agreement has happened and if not you just redirect to the home page and have them like agree and set the cookie and then they can click again the problem with this is that Googlebot will get stopped here because it's gonna be in this like redirect and we're gonna be like oh so this page actually doesn't exist or it does not exist anymore and it's now just this main page that's weird yeah and that's because Googlebot does not have any state besides the cache so we're not using cookies well we you can write to cookies but we're gonna erase them for before the next crawler happens so no cookies no local storage no session storage no index TV you have all these interfaces but you cannot use them across page loads right so if you write something into index to be in the beginning of the load of your JavaScript and then read from it that'll work but it'll not work if you like go to the next URL and then the crawler picks it up and then runs it because the cookies and everything else will have been cleared in between and it makes sense this is not a bug if you think about it imagine a user searches for something and finds the new story as a search result clicks on it but lands on your home page says yes to the cookie and I was like where is this article now so instead what you should be doing is you can have this cookie pop up on every page where the cookie hasn't been set yet we'll be fine dealing with the cookie pop up if you do it right but the user has a better experience because they're gonna come to your page see the cookie pop up go like yes of course and then they can actually read the article without having to navigate through your homepage also you want to use feature detection because even though a bunch of features are present doesn't mean that they always like work or work properly or are actually present you know just because I have a nail doesn't mean I also have a hammer to put it into the wall I need to make sure that I do have the hammer in the other hand before I use it cool but that sometimes can go slightly wrong so I actually an example so we check if this browser supports geolocation it does that's fantastic so then we load content local to me so if it's if this is again a new site it might load new stories from Russia now and back in Switzerland it will load Swiss news stories if my browser doesn't support your location that is fine we just load like the global news cool however this API causes a permission pop-up that I can decline and sometimes on one of my older phones the GPS stopped working so it never actually got a GPS location and it timed out I would not get any content here because my browser says yes this geolocation is supported but then error conditions happen and I actually don't get the content Googlebot will decline these permission requests what what sense does it make to give us a like the microphone axis or webcam axis in Googlebot what do you want to see our data center like that's not gonna happen so instead what you should be doing is you should always look out for the error conditions and handle them properly in this case if the geolocation exists but fails we also load the fallback content much much better everyone gets content all the time if you want to learn more about the specifics of Googlebot where features are having limited support or are not supported we have put together a guide that walks you through the steps of figuring out what's happening so you can try that to fix your problems more easily also you want to make sure that users are seeing your page in the best possible light right so if I'm looking for an apple pie recipe and all I get is Barbara's baking blog 20 times I'm like but what which one is the is the recipe we also have this like snippet descriptions these are called meta descriptions or snippets as we call them sometimes and you can give them per page and then I know ok so cupcakes I don't care brownies I don't care apple pie that's the thing that's the one that I care about you don't have to write these that's what your content writers copywriters and sometimes also SEO s do they they are good at optimizing these but you need to make sure that you have a sound technical foundation to provide them and the way to do that in react for instance is react helmet you install that additional pack package and then you can use your properties from the component from the page to actually populate the title and meta descriptions and other meta tags so in this case we give it like a helpful title and a helpful description let's say like Barbara's apple pie recipe you this is the apple pie recipe my grandma used and it's really easy to make and quick and fantastic and everyone loves it great that's that's it I want to click on that result I'm hungry you might notice in angular you have the built in title and meta services they do the same thing as in react you just give it the the properties that you need to create a style and meta snippet and in view you use the view mera package to do the same thing that's cool and if you want to learn more about these things we have a video series that called the JavaScript SEO series on our youtube channel that goes over these and other things as well we talked about testing and essential tips and the different frameworks and all that kind of stuff we also talked about a concept that I gonna discuss later cool now we talked a lot about JavaScript and how Google search renders it but the reality is that not every crawler does this there's other search engines and then there's like social media networks and other other applications that crawl your page when you share a link but they don't run JavaScript on your page so how do you deal with that well maybe consider something like server-side rendering server-side rendering isn't really a workaround it's something that is fantastically fast usually because it generates the HTML and the H the browser can parse the HTML as it arrives with the client-side rendering it has to get the HTML and then the done it has to download the JavaScript parse it and executed and then generates more aged you know that's that's gonna take longer that's just the reality if you don't want to do that and like lose all your JavaScript features there's a thing called hydration most frameworks offer it in one way or another which means your JavaScript becomes optional so you get the HTML real quick and then once the JavaScript is executed it actually upgrades into the regular single page app and client-side app if you don't want to do that because that you know doing that on every request is quite a bit of cost that you incur you might want to do pre rendering that makes sense if you have a page like a blog or a marketing page where you know when the content changes on my blog the content only changes if I write a new blog post or edit an existing one so I know it's precisely what I need to re render it and then you can use something like a headless browser for instance you can use puppeteer or any service that does that for you or you actually rewrite your application slightly into universal JavaScript and execute the JavaScript on the server side that works just as fine if you don't want to make changes to your front-end you can also use a workaround called dynamic rendering it's in the video series as well then we're going to talk about that in a second but let's look at how server-side rendering would work in react so here I am using next js' which is a higher fare higher level framework that uses react components but it basically pre renders it for sorry so every site renders it for me on the fly so that's kind of nice it works with things like now insight and all that kind of cool stuff and it's really fun to work with this and there's actually quite successful there was another talk at the same time about next JS so sorry to keep you here but definitely check out the videos once they uploaded if you want to do react pre-rendering there's a thing called react snap that uses a headless Chrome to crawl my my different pages and generate the HTML for me that I can then deploy on any static hosting that's kind of nice as well and react snap also has the option to do hydration you would have to change your app component a little bit to either hydrate or render the application depending on if you're on the server or in the browser dynamic rendering on the other hand is this workaround where your server looks at the user agents and if they are looking like crawler then you would pass that into a either like a rendering service that you can like you know become a customer of I guess or you use your own the Tron instance or use like puppeteer to build your own thing or phantom Jas whatever works for you and you will generate static HTML and send that to the crawlers but for your users because you want to make sure that everything works the way that you developed it and it's like basically untouched for user browsers you just give them the regular thing like the regular client-side or Android application we consider that a workaround because it is a bit of effort and it with a bit of maintenance that you have to deal with it has a bunch of complications around caching and all that but it gets you started it gets you working with crawlers that Onix run JavaScript and also can help you with problems with you have with your JavaScript and Googlebot real quick but it's a workaround because it doesn't give you users any benefits server-side rendering or pre-rendering are a longer term solution and a higher investment yes but a longer term solution that helps you also deliver your content faster to use this which makes your users happier and more engaged if you want to learn more about dynamic rendering we wrote a blog post and the code lab and a documentation page on that so if you want to try that out give this link go now that's all nice and fine but I think it's time for us to wrap up and enjoy the coffee break or the Q&A session so wrapping up I think you should use tools like the mobile-friendly test or search console to test how your pages are doing in search these are our tools so they give you the actual data they're not like making stuff up don't worry about that and they are free and they are fantastic search console even emails you if there's a problem so if we find an issue we let you know if you are registered if we know that you are owner of this domain we'll let you know server-side rendering and pre-rendering as I said our fantastic investments for a longer term you definitely should look into it if you're not using that yet again I recommend Natalia's talk on next Jas it's probably really really good as a resource as well if you're using angular you want to look into angular Universal and if you're using view Jas then you might check out next Jas as well generally speaking if you help crawlers understand your pages you'll have a good time in most cases it'll just work if you write semantic HTML like proper links and stuff you can add structured data to tell us more about what something really is and you know basically just help us with robots.txt and meta tags to figure out what this page is about and then you'll have a pretty pretty fun time you're pretty much safe if you want to learn more we wrote a bunch of guides to help you out there's also starter guides on this page for developers was like a developer's guide to search or something like that it's fantastic my coworker Lizzy wrote most of it and it's really really good you can also watch videos if you prefer videos the JavaScript SEO videos are online on our youtube channel youtube.com slash Google webmasters you also can ask us questions every couple of weeks we run office hours online office hours so you can jump on a hangouts with us and ask us questions and we're happy to answer them those hangouts tend to be pretty generic so it's not just technical SEO it's also regular SEO but you might learn a thing or two so that's kind of cool and you might figure out like who's hanging out there is there someone who like has the has the questions that I have myself and now got an answer so maybe they can help us but if you have JavaScript specific questions we have a mailing list that you can join and ask your JavaScript questions there and that's like just focused around JavaScript sites you also can stay up to date with us on our blog so we have the webmasters blog that we keep posted and we post announcements too or you just follow us on Twitter so with all that being said as all folks spasiba by Shia and have a good day oh oh Seba Martin what I'd like to talk for the conversation maybe in Russian oh you don't want to join me for a conversation well it's not like you have a choice you know this was very exciting insightful thank you so much for all the tips and all especially I really didn't know that Googlebot doesn't click and doesn't process data and that's actually something that's quite useful to know um so I think at this point though it would be useful to know is every content every piece of content equally viable important to Google when you index it for example if there are some things that are generated by a JavaScript and there is something that's already a static and leaves in HTML there was I think there was an article maybe even in the video series I mentioned once it might take more time for the content generated with JavaScript to be indexed so if you have like a breaking news story that you want to show up in search results immediately that it's probably better to use server-side rendering that's yeah true that is true so we're working on fixing that quality lay issue but it's still a thing that we have to as I said in the process right we parse the HTML and if we have content there already we put it in the index so you land in the index quicker it doesn't dynamic rendering we solve that dynamic rendering would solve that because we have the content then in the initial HTML it doesn't mean that we are treating it as less important or anything like it doesn't say anything about the quality or how we are treating it in the index it just means it takes longer for it to get into the index all right how much delay are we talking about here like it catechins minutes it's already minutes it can be seconds it can be ours again theoretically be days but the problem is so like a few people are saying as weeks and months and the problem there is that you don't see where the delay comes from it can be crawling delay because as I said we have crawl budget and we have crawl demands so if it's a page that only gets crawled once a month and you just missed the crawl and then update something with job like some of the content that just gets generated with javis commands for this content to Schwaben we're like yeah it would have taken a month for HTML content to show up because he just missed my window it's interesting you also have a couple of questions here coming in like one of them that's been quite popular can you call a can you name a few things that really have a dramatic influence on the search engines um on the search ranking like security HTTP yeah Foreman's probably the things that are really really important are I'm sorry content your content should be good that's like the very first thing that needs to be good I know that's not something that I know if you look at technical signals if your site got hacked if you are using search console we let you know that your site got hacked and that it got removed also if you do some things like if you have like a virus and action or there's like a Bitcoin miner on your page we're gonna not show that insertion so you can't do that anymore you can't do that anyone sorry sorry you can't use our infrastructure to mine bitcoins yet anymore , yet who knows but basically social security issues our big thing HTTP is more deal tiebreaker so if we have like two pages and one is HTTP the other one isn't we might switch them but it's not HTTP it's important for your users and there's a trust signal but it's not one of the most important ranking factors mobile friendliness is pretty important as well because most people are coming from mobile now but yeah so like I would say security issues our big thing web performance is a big thing because in the end you want your users to be engaged and have the content quick and definitely I know we can't influence it directly usually as developers but if you see stupid content like the hot bread x10 thing then talk to your SEO and talk to your marketing departments because sometimes the SEO s have the same problem as we have as developers so they are like we need to change this content and arrows like no I don't think so I think this is fine but if the developer also comes and goes like what tip what is this then you know it's more people pressuring to fix the content and the important thing okay we have also a question from Dennis and Demetri I look inside shadow root when page index happens what about web components so with the new Evergreen Googlebot we have shadow dom support negative shadow dom support and i think if i remember correctly that's fine like shadow dom content will be indexed by Googlebot but again remember there's crawlers that don't run JavaScript and might not see your shadow Dom content and also for composability it's a good idea to put like critical content into the light Dom so that you can override it and not modify it by wrapping it into more components enough so I would I would say I would still stand by my previous statement a couple of months back that you should put it into the light DOM and use the shadow Dom to encapsulate implementation details rather than content mm-hmm it's also interesting to see a lot of things like you know cities are generators becoming a real thing where you will pre render and just put in the service much you can just so you have a nice skeleton yeah it's interesting that it's exact what I wanted to kind of mention feels like we are kind of moving full circle at this point and dust existence let's say of CD ends if you put the content to the CD answers it also help is it just it does help because it's things are faster usually right if they are closer to the user which is what CD ends are mostly about and that's a fantastic way of getting it faster mm-hmm okay all right so I think I'm running out of questions here but I actually very happy to see that we actually moved away from chrome 41 and what was the reason actually why we got stuck with it is it like technical is there because it wasn't supporting grid layout yes it wasn't supporting many features yes so the reason for that was that maybe you should also explain the like so do write that we start right so the the situation before i/o was that we were running chrome 41 in the rendering service so we have updated so in chrome 74 don't worry but you will hear SEO s or people sometimes even developers oftentimes go like what it's like now today I hear web developers going like oh what I just found out chrome is run a Google is running chrome 41 for rendering and I'm like yes but no has changed since I oh god job so you'll hear that a few times like oh but it's using chrome 41 and you can say actually no there's a blog post from Google saying that that's not true anymore and the reason for that was that we had chrome 41 back in the days did not have any api's two-legged instrumentation happen so we couldn't like pull into it so we had to write a lot of custom code to get data out of the render that we did then the team said okay we can update it so let's say like chrome 50 something but then we ran into the exact same problem because we it takes us like half a year or a year I don't know how long to port the code from one chrome returning to the next and then like chrome keeps running ahead of us so we need to like figure something out and then they decided to work together with another team and that's the team that brought us puppeteer and the dev tools API basically so the effluence integration is something that our rendering team helped with and that was the strategy was to bring that up on the road to actually be able to use that rather than the custom code so once it landed in chrome and I think that was like a year ago or something like that dev tools API landed in chrome that was the moment when they could start actually building or rebuilding the code that they used to use these api's to extract data and then they still have some special API on top of it but most of it is now like in the public source code so it's easier for us to update and but that just took like a couple of time okay and there is just one more question came from Alex so when it comes to single page applications react view and so on what are the common things that we need to look out for like the common mistakes that people do all the time so the common mistakes because I still need to talk to the framework teams on the duck about the documentation the documentation neglects the meta description and the title tags it's just like yo you have this HTML and now we forget about it and now we do all this cool stuff here and no one ever talks about the fact that now all your pages have the same title and Meta Description which is not fantastic so that's like a really low hanging fruit that you want to fix first okay and then it's like performance optimization and making sure that your render actually works properly so you want to test this and then you see like where the problems are it depends a lot okay it's put quite a bit of time on search engine optimization these days all the resources and tools and techniques and on every it wasn't a compliment wasn't wasn't far I'm just now I mean we're trying to get as much documentation and to developers hands as possible yeah that's very cool I actually I think like it's I mean I don't know maybe I'm I'm wrong but it feels like over the last year or two it's been a lot of push towards evangelizing and explaining the purpose and how it works and all so I keep doing that yes thank you will do that Thank You Martin can you be with us today [Applause]
Info
Channel: HolyJS
Views: 3,043
Rating: 4.9058824 out of 5
Keywords: javascript, holyjs, seo, google, web
Id: XF08jiOKaiQ
Channel Id: undefined
Length: 55min 7sec (3307 seconds)
Published: Wed Aug 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.