How Google's PageRank Algorithm Works

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
when you perform a Google search how does Google determine what order to show you your results one of the most important algorithms Google uses to do this is called PageRank and it's an algorithm that attempts to estimate the importance of a website what does it mean for a website to be important well the web consists of pages and those pages can be connected to one another via links and the PageRank algorithm generally assumes that if many other pages are linking to a particular page then that page is probably important so a web page that is linked to more would be considered more important than a web page that is linked to less but there's a problem with that approach the problem is that it's easy for someone to artificially inflate their own web pages importance if I wanted to make my web page it seemed more important in the eyes of PageRank for example I could just create lots of other pages that all linked to my website using that strategy I could make my website seem as important as I wanted it to be so to really define what it means for a page to be important we need to nuance our definition a little bit more a page is more important the more it is linked to by other important pages but this definition seems a bit circular how can we calculate a pages importance if doing so requires knowing the importance of other pages well one way to calculate this is using what's known as the random surfer model the idea is this imagine someone browsing the web they begin on some page chosen at random and then they randomly pick a link from that page to another page that they visit the random surfer keeps repeating this process pick a link randomly visit a new page pick another link visit another page the idea now is that we will keep score will maintain a count of how many times our random so prefer visits each page each time they land on a new page will update that pages score pages that have more links to them are more likely to be visited so they'll eventually have higher scores and because those pages are more likely to be visited the pages they link to are also more likely to be visited so we linked from a more important page will matter more than a link from a less important page after we continue this process for a while we can take a look at the resulting scores and calculate what percent of the total score each pages score is this gives us some measure for the relative importance of these pages represented as what percent of the time a random surfer on the Internet can be expected to be on that page there's still one problem with this approach though and it's the fact that pages on the Internet might not all be connected to each other imagine a network of pages like this for example if we randomly start on this page and we keep following links will only ever visit one set of pages on the web completely ignoring the rest of the internet since none of the other pages are reachable via any of the links from the pages were currently visited so to solve this problem we need to occasionally reset our random web surfing we do this by introducing what's called a damping factor if the damping factor is 0.85 for example that means that 85% of the time our random surfer will follow a link from the page that are currently off as they were doing before 15 percent of the time though our random surfer will instead it switched to a page on the internet chosen completely at random with enough time this ensures that will eventually explore all parts of this network of webpages and not get stuck at one particular set this model lets us now take any network of webpages and calculate the relative importance of those pages in the first few steps the random surfer takes the numbers aren't particularly accurate a lot is based just on random chance but with enough time the random surfer will continue to explore more and more and the numbers will eventually converge to a stable PageRank value for each page and those values can then be used to determine what order search results should appear in with the more important pages appearing first PageRank isn't the only way to calculate the importance of webpages but it's a pretty effective way and it ensures that for the most part when you search for something the results you get are hopefully the results you actually want
Info
Channel: Spanning Tree
Views: 117,357
Rating: undefined out of 5
Keywords: google, pagerank, algorithm, computer science, technology, search, search engine, random surfer, model, damping, markov, chain, outbound, link, importance, url, larry page
Id: meonLcN7LD4
Channel Id: undefined
Length: 5min 16sec (316 seconds)
Published: Wed Jun 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.