The algorithm that started google

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video was sponsored by brilliant [Music] imagine a new kind of dating show where one Bachelorette will pick among four guys who let's just say all our friends and know each other well now instead of going on dates with the Bachelorette these four friends are simply going to pick who they think the best person is for the Bachelorette they're basically going to endorse each other however no one can pick themself but we will allow anyone to pick more than one person if it's a tie and we'll assume everyone's being totally honest so like let's say Alex endorses Bob meaning Alex thinks the Bachelorette should definitely choose Bob among the other two and on top of these arrows are at the names of who endorsed that person above them to keep it all very visual then let's say Bob chooses both Carlos and David as a tie for the best definitely better than Alex so those each get one endorsement from Bob Carlo says that Alex and David are tied for the best so we'll write those endorsements there and then David says that Alex and Bob are tied for the best now the question is who should the Bachelorette pick there isn't necessarily a right answer but we're going to see one way to obtain a numerical rank for each person where there's no ties something to notice that Alex Bob and David all have two endorsements each however Carlos only has one so it looks like Carlos would be the worst pick which we will find is mathematically true but what about the rest well let's look at Bob real quick he was endorsed by Alex however that's the only person Alex endorsed meaning Bob clearly stood out as the best to Alex and that endorsement should mean more than others I mean Bob was also endorsed by David but so is Alex so either one of those endorsements wasn't as meaningful we can say we'll soon see how this is the reason Bob is going to be ranked number one now let me just pause real quick to say that what you're seeing is the very basics of how the Google algorithm works and how they rank web pages across the internet it's a complex algorithm but a big part of it is seeing which sites link to which others if we just consider the bachelors from before as web sites and an endorsement as a link from one site to another which can kind of be thought of as an endorsement then we have the basics of how Google ranks web pages the ranks would just be like which sites show up first based on what you put into Google but now let's get to the actual math so we can see some numbers associated with these ranks I'll consider the nodes as web sites from here on and we'll just assume the entire Internet is right now just these four sites okay so first to mathematically represent one endorsement being more meaningful let's say that each website can give out one point of importance so like when website a link to be only it used all its importance or one point for that other site but when website be linked to C and E it gave out half an importance point to each so yes it must be split evenly see we're assigning weights to all the links as in the more links a site sends out the lastly meaningful each of them is then Site C gave half an importance point to website a and D and the same thing with D linking to a and B now we're going to organize all these numbers and a matrix each row and column will refer to one of the website so I'm just putting the letters here to make things easier to follow the first column would be the points that site a gives out so it gave zero points to itself one point to site B with that one link no points to side C and no points the site D the next column will be who site be linked to they gave zero points to site a the zero points to them self half a point to C and half a point to D the next column will say that site C gave half a point to site a no points to site B or itself and then half a point to site D and the final column says a site D gave half a point to a and B then nothing to the others notice that the diagonal will always have zeros because you can't link to yourself and you'll also notice that each column adds to one since each site can only give out one point in total the rows on the other hand tell us who got the endorsements like site a got two endorsements half a point from C and same from D but nothing from the rest say B received one point from a and half a point from D but none from the rest and here you see the lowest ranked sites I see got only that one endorsement from site B by the way no the algorithm is not just adding the numbers in these rows yes B got the most points in total but that's not the only reason it will be ranked number one I mean you'll see a and D tied yet one of them will be ranked higher as we'll see soon now the reason we picked one important point to be given out in total is because these numbers now represent probabilities like if I go to a website a and randomly click a link there's a 100% chance or a decimal value of one that I will land on website B because of that one link well on some at website B if I randomly click link I have a 50% chance of going to C and a 50% chance of going to D and this would just continue from there so finally here's how we'll determine ranks if we spent a long time randomly clicking links assuming equal time intervals between each click what percent of that time would be spent on each site those percentages which will add to a hundred or a decimal value of 1 would be the rank of each site and we'll put those values in a ranking matrix called R now to begin this internet search you have a 25% chance of just picking one of these web sites kind of like our initial ranks but now let's find the probability of landing at any given site after just one click to do this we're going to take our 4 by 4 matrix and multiply it by the initial ranks or probabilities as many of you know to multiply 2 matrices you multiply rows by columns one at a time remember this first row is all the points given to site a or the chances of getting to a from any given site so this dot product tells you the probability of being at a after one click think about it this says you have a 25% chance of starring at a time 0% chance of going to a from there since it doesn't link to itself when you start at B same initial probability there's also no chance of going to a because there's no link but if you start at C pick a link there's a 50% chance you get to a and the same thing with site D these are all the probabilities of just going to a from any site so when you add those up you have the total probability of being at a after that first click it just so happened to stay the same at 25% we multiply the next row of all the probabilities of going to site B by our ranking matrix we get 37.5% since everything was the same except you had that higher probability here of going from A to B the final probability was of course higher for the next row we multiplied and get a value of 0.125 and then the last leads to 0.25 these tell us the probabilities of being at each site after the first round of clicks but I said we have to find the probability of being at each site after doing this for a long time meaning we have to keep going and take our original matrix of probabilities then multiply by this new ranking matrix to get the probabilities after a second round one thing you'll notice is that the probability of being at C is actually higher now that's because in the last round you had a higher chance of landing at site B and that itself links to C so that probability went up then we just repeat taking these new probabilities or ranks and find what they'll be after a third click where the numbers finally start to separate a bit after many many multiplications aka a lot of clicks the probabilities will converge to these values here and these are the ranks of each site we see that site B is ranked highest as expected and C is the lowest but also note that site D ranked higher than a even though they both had two incoming links by the way another way to look at how we got these is we took our four by four matrix that'll just call a and multiplied by those initial ranks we then took that value and multiplied by matrix a again then we did that again and just kept repeating so really this was just a to the end times our as n approaches infinity the product reaching equilibrium of our final ranks so now we can answer our question from before of why did site D rank higher than site a even though they each got two endorsements it's because site D has a link or endorsement from site B the highest ranked site while site a does not say B might be something like Forbes very high ranked and getting them to link to you is like a big stamp of approval in the eyes of the algorithm which is why D beat out a so those were the basics and now adding web sites and links to the web just leads to changing the numbers or dimensions on matrix if site B now links to a all we have to do is take our matrix and change the B column where B will give out one third of its importance to three sites now then if we did all those matrix multiplications these would be the final ranks so as expected a has surpassed D in rank since I got a new endorsement but here's what's weird say a now has three endorsements more than any other site yet site B is still ranked higher that may not seem right you might think a should be number one but remember that site only endorses one other site B which is HUGE on ranking that importance of a is transferred to be essentially making it number one instead I mean B endorses a back but it's one of three lengths that B gives out so not as meaningful see it's not about links it's about quality links you need endorsements but from credible places and the algorithm accounts for all of this or think of the from a probability point of view if let's say twenty nine percent of your clicks lead you to site a then I know for sure at least another twenty nine percent lead to B since going to a must be followed by going to B but you can also get to be from D which will increase the probability beyond the 29% surpassing a's rank now let's say you start a blog introducing a new site to the graphic then exchange links with some other low rank sites such as c well now all we got to do is expand our matrix to a five dimensional one for five sites most entries will be 0 in this case but sites see links to e so we got to change that column to have values of one third for each link and then site e gives out one link all it's important to cite see this is good for site C because it's a meaningful link but it won't help too much with rank because the link comes from e a low rank site thus not a quality link after doing the calculations these would be the ranks of each site as expected C's rank went up and is now tied with D but it still hasn't come close to a or B yet now after all this there's a lot of the stories still missing I mean if one site is completely isolated all sites would be ranked 0 which makes no sense and I didn't talk about how that's accounted for there's also something called the damping factor which I didn't mention since it complicates things and what we've seen is the basics of what Google first implemented but of course they've expanded a lot so is this exactly how the current Google algorithm works of course not I have no idea how it works and I don't think really anyone does besides the people at Google who work on it but overall this is still the general idea of how Google ranks websites there's of course a lot I couldn't get to but if you want to learn more about this algorithm and you just enjoy this kind of applied math I definitely recommend checking out brilliance linear algebra course or they'll take you through the basics all the way to applications like the PageRank algorithm and this section includes things I didn't have time to get to in this video so you'll have more to look forward to learning the course starts with all the basics of your beginner so you'll learn vector spaces linear transformations determinants and more but they're also going to show you some of the more advanced topics and unique applications that you very likely didn't learn in an introductory course brilli is a platform that contains dozens of math and science courses ranging from high school math all the way to more advanced courses like differential equations or abstract algebra but what I like about this site is that they incorporate those unique applications and interactive exercises as you go through any course so you know you truly understand the concepts before moving on so if you want to get started right now and support the channel you can click the link below or go to brilliant org slash major prep plus the first 200 people to sign up and get 20% off their annual premium subscription so again links are below and with that we're gonna end that video there if you guys enjoyed be sure to LIKE and subscribe don't forget to follow me on Twitter and join the major Facebook group updates on everything hit the bell if you're not being notified and I'll see you all in the next video
Info
Channel: Zach Star
Views: 174,532
Rating: undefined out of 5
Keywords: majorprep, major prep, pagerank, pagerank algorithm, google algorithm, mathematics of google search, math of pagerank algorithm, applications of matrices, ranking algorithm, how does google rank websites, google, algorithm, computer science, graph theory, linear algebra, markov matrix, probability, math, applied math, math applications, how does google work
Id: qxEkY8OScYY
Channel Id: undefined
Length: 13min 40sec (820 seconds)
Published: Mon Sep 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.