CS224W: Machine Learning with Graphs | 2021 | Lecture 4.2 - PageRank: How to Solve?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so what i want to talk next is how do we solve the pagerank equation how do we actually compute the vector r the idea here is to use the method called uh power iteration and what we would like to do is we'd like to do the following right given a graph on n nodes we want to use an interrupt we will use an integrative procedure that will update over time our rank vector r and the idea will be that we we start the procedure by assigning each node some initial random pagerank score and then we are going to repeat our iterative process until this vector r stabilizes and the way we are going to measure whether it stabilizes will say here is our previous estimate now t runs over the iteration of our algorithm where we say this is our previous estimate in the vector r this is our new estimate in vector r and if the coordinates the entries in the vector don't change too much they change less than epsilon then we are done and the equation we are going to iterate is written here basically node j will say my new estimate of my importance is simply take the estimate of the nodes i that point to me from the previous step you know divide each one by the out degree of node i sum them up and that is my new importance and we are just going to iterate this um you know a couple of times and it's uh it's guaranteed to converge to the uh to this uh to the solution which is essentially saying this is guaranteed to find me the leading eigenvector of the underlying uh matrix m so um the method that does this what i explained is called power iteration where um basically the way we are going to do this to just write it again in this very simple form we initialize vector r you know let's call in calling initialization to be time zero to be simply you know every node has let's say the same importance or you can assign random importances and then we will simply iterate you know new estimate of r equals m times a previous estimate of r and we are going to iterate this equation uh for long enough until the the uh and the entry wise differences between estimating the previous round and in the next round when the sum of these differences uh is less than epsilon and again notice that this equation is exactly this equation right this is just written in two different ways but it's exactly the same thing one last thing to say is you can use the what is called l1 norm so the sum of the absolute differences here you could also use let's say euclidean norm so some of the squares of absolute differences uh if you like um and generally it takes about 50 iterations you have to compute about 50 50 you have to compute this product 50 times before you reach this stationary distribution or this limiting solution so basically you can compute pagerank by a couple of uh matrix vector multiplies uh and you are done and this is important because you know google is computing this page rank every day over the entire web graph right of tens of billions of uh of nodes uh and i know hundreds of billions of edges right so this is really really scalable and you can you can compute it on the on the graph that captures the entire web so uh to give an example again uh power iteration method you know written again in a different way our matrix m our set of flow equations uh and what i'm going to show you here is iterations of the algorithm where we set node importances to be simply uh one third at the beginning and now we multiply with matrix m and you know after we multiply once here are the new values we multiply the second time here are the new values and you know the third time and as we keep multiplying the the the value of values of vector r then will converge to a stationary vector so that m really equals r really equals m times r um and the final importances will be 6 over 15 6 over 15 and 3 over 15. so it means y and a will have importance of 6 over 15 and m will have a lower importance of 3 over 15. so um this is this is what pagerank is going to give us so now that we have seen these equations and everything seems beautiful we need to ask a few questions so first question is does this converge second question is does it converge to where we want and the third question is are the results reasonable right so basically what i said right now is create the graph represent it as this matrix m run this iterative power iteration procedure it will converge in about 50 steps and you will get your vector r out of it so let's look at this uh a bit more carefully so it turns out that with what i explained so far there are two problems the first problem is that some pages are what is called dead ends they have no outlinks and it turns out that for such web pages the importance the votes kind of leak out i will tell you what i mean by that and then there is also a second problem called spider traps where all outlinks are within the same group and the spider traps eventually absorb all all importance so let me now give you an example and you will see what's happening so um first spider traps here in this case right we have a links to b and then b has a self loop so if you run this a power iteration over an adjacency matrix describing this graph what will happen is that in the end a will have important zero and b will have importance one and if you think of this why is this happening is because wherever the random walker starts uh you know it will traverse this edge and get into b and then it is just going to to to be stuck here in b forever so really you know after some number of time the random walker is is in node b with probability one and can never go back to a so this is called the spider trap because the random walker gets trapped and at the end you know this may be uh all the importance will be kept here in b and you can imagine this that you know even if you have a super huge graph here eventually the random walker is going to traverse over this edge and then be stuck forever in this uh self loop so that's the problem of spider traps um and then here is the problem of dead ends the problem of dead ends is now that node b has no outlink and what this means that if you would simply create an adjacency matrix for this graph run the power iteration it would converge to all zeros and intuitively why is this happening is that as soon as the random walker gets to node b the random walker has nowhere to go so it kind of falls off the cliff and the and it gets lost right and this way the the dead ends kind of this importance doesn't yet sum to one anymore but it leaks out of the graph so um this is uh these are two problems that we are going to address um and you know what is the solution the solution is this notion of uh random jumps or teleports so the solution for spider traps is that we are going to change the random walk process a bit so basically saying at every time the random walker will not only choose a link at random but can also decide to teleport itself so let me explain what this means so we are going to have one parameter beta that will allow random walker to do one of the two choices with the probability beta you know the random walker will decide and follow a link at random the same way as we discussed um so far but with probability one minus beta the random walker is going to jump teleport to a random page and the common values of beta usually are between 0.8 to 0.9 so it means that you know if a random walker gets stuck in a spider trap it will stay here for a few steps but it's eventually it will it will it will be able to teleport out right because with some smaller probability the random walker will say let me just randomly teleport to a random page so it means that out of every web page out of every node there is a way for you to teleport yourself somewhere else to basically randomly jump somewhere else and this is um how now spider traps are no longer the problem because you don't get trapped you can always uh jump out you can always teleport you know the scottie can all you always kind of beam you up that's kind of the idea um how about the dead ends the the the way you do this is also with teleports essentially what you say if you come to a dead end if you come to node m and there's nowhere for you to go what you what you do is you simply teleport with probability one right so uh you know why were the dead dead ends the problem dead ends were the problem because m has no out links so our column of this uh matrix m the column stochastic adjacency matrix is not is the the the column stochasticity is uh violated because column for node m does not sum to one because m has no out links so what do we do is we fix this by basically saying when you arrive to know them you can you can randomly teleport wherever you want you can jump to any node so this means in some sense now now m is connected to all other nodes in the network including itself and you know the random walker can choose any of these links with equal probability and this solves the problem of dead ends it essentially eliminates them so why do teleports solve the problem right why why are dead ends and spider traps a problem and why do teleports solve both of them right spider traps in some sense are not a mathematical problem in a sense that the eigenva the eigenvector is still well defined the power iteration is going to to converge everything is fine mathematically but the problem is that the pagerank score is not what we want right we don't want to say there is one page on the web that is important uh has all the importance and everyone else is zero important right so the solution here is to add teleports this means the random walker never gets trapped in a spider trap um and it will be able to teleport itself out in a finite number of steps which means that all the nodes on the web will now have some importance so this basically is uh solves us this particular issue so spider traps are not a mathematical problem but our problem that pagerank value is not becomes not what we want and then dead ends are a problem mathematically because our matrix m is not column stochastic anymore and our initial assumptions are not met so power iteration as a method does not does not uh does not work and does not converge so the solution here is to make the column the matrix column stochastic by always teleporting when there is nowhere to go right whenever you come to a node without any outlinks you always randomly teleport um and this is now means that basically the same solution of teleports both give gives us pagerank the way we wanted to define it intuitively and also fixes the underlying mathematical problems that all these concepts that i discussed are well defined so what is the final solution or the google solution to this problem um the solution is that at each step the random walker has two options you know it flips a coin and with some probability beta it's going to follow an outlet outlink at random and with the remaining probability it's going to jump to a random page so the way now our pagerank equation that was defined by um sergey and brin um or page and brin back uh in uh 1998 uh is the following we say the importance of node j equals the beta times the importances of node i that point to it right divided by the route degrees plus 1 minus beta 1 over n so the way you can think of this is to say if a random if how likely is a random walker to be at node j right now it with probability beta it decided to follow an uh an outlink and this means it was at no dive it was with some probability r sub i and it decided to follow an outlink towards uh node j following you know picking the right outlink out of the d sub i outlink has the probability one over d sub i so that's what's happening here and then we say oh and also the random walker could come to the node j by basically teleporting one minus beta is probability of teleporting now how likely is the random walker to land at node j node j is just one out of n nodes so the probability that it landed at specific node j is one over n and this is essentially the pagerank equation and the iteration one can run uh just note that this formulation here assumes m has no dead ends the way you can do is you can pre-process a matrix m to remove all the dead ends um and or or exclusively follow random teleports with probability one out of dead ends so that's how you can fix this but you can see again this is very fast and very simple to iterate so i just gave you the equation in this the flow based formulation in some sense you can also write it in a matrix form where you say my new matrix g right so this should be g equals beta times the stochastic matrix m plus one minus beta uh times the uh the matrix that has all the entries uh one over n so this is the random teleportation matrix and this is the the transition matrix over the edges of the graph um and then you have this again recursive equation that r equals g times r and you can iterate this power power iteration will still work um and if you ask what should be the beta value that i that i set in practice we take beta to be between 0.8 uh and 0.9 which means that you the random walker takes about five steps on the average before it decides to jump just to be very clear the random walk is just the intuition and we'd never simulate the random walk right in the previous lecture we actually said let's simulate the random walk here we don't simulate the random walk but in some sense we think of it as being run infinitely long and then we say we show that actually we can compute this infinitely long random walk by basically solving this recursive equation by basically computing the the leading eigenvector of this uh graph transformed matrix uh that i call g uh here so the random walk is just an intuition because we never truly um we never truly uh simulated so to show you how this works here is my little graph on three nodes uh here's the matrix m uh notice that uh the node n is a is a spider trap so what do i do now is i add also this um random teleport links so i have this matrix 1 over n let's say that my beta is 0.8 so now my new stochastic transition matrix g is written here right it's 0.8 times the matrix of link transitions plus point to the matrix of random jumps where basically you can think of this that every column says if no if a random server is at a given node then this is the probability distribution where the random surfer is going to jump and if you add these two together you get a new transition matrix now that includes both traversing over the links of the graph as well as randomly jumping here is how you can think of this in terms of transition probabilities these are now in some sense transition probabilities of a random walker random surfer and then you can multiply r with g multiple times here is the r0 and now we are multiplying it over and over and over again and you know after some number of iterations it's going to converge and it converges to 7 over 33 5 over 33 and 21 over 33 so it means that node m in this graph will be the most important followed by y followed by a right and the reason why m is so important is because it's kind of a spider trap but we also are able to teleport out now if intuitively this is uh you know no damn kind of collects too much importance you can increase uh value of beta and the importance of node m is going to decrease and just to show you an idea how this how this looks like in in a bit more interesting graph this is a graph where node size corresponds to its pagerank weight and also there is a number that tells you what the pagerank score of the node and what do you notice for example why is pagerank so cool it's cool because for example first notice all nodes have non-zero importance so even these nodes here that have no in-links they still have some importance because the random jumper can always jump to them another thing notice is that for example node b has a lot of in-links and so that's why it has high importance right notice that for example node e has you know it has five in links six in links and uh node b also has six in links but because node e gets most of the in links from these unimportant pages its importance is you know eight versus b who is 38. so b is much more important because it gets in links from these other nodes that have higher importance than these little blue nodes another thing to notice is for example node c has only one in-link but because it gets it from this super important node b its importance is also very very high right you see for example also that uh here node e has uh some uh uh you know some importance uh d has less f has less uh they both have the same importance d and f because they both get one in link from node so notice how these uh importances are very nuanced and they take a lot of different considerations into account that all make sense in a sense of i want a lot of in-links i want in-links from important nodes even if i have one in-link but somebody very important links to me that means i'm very important um and so and so on and so forth so this is why this notion of pagerank is so so useful and also there is a lot of mathematical beauty um behind its uh its definition and we can efficiently compute it for very large scale graphs so to summarize we talked about how do we solve the pagerank uh scores we solve them by iterating this equation r equals g times r and this can be efficiently computed using power iteration of the stochastic matrix g um and adding uniform teleportation solves both the issues with dead ends as well as the issue with spider traps

Info

Channel: stanfordonline

Views: 8,439

Rating: undefined out of 5

Keywords: Jure Leskovec, Machine Learning, Machine Learning with Graphs, Graphs, CS224W, Stanford, PageRank, principle eigenvector, adjacency matrix, node groups, solve for PageRank

Id: rK2ZBmQHVVs

Channel Id: undefined

Length: 20min 41sec (1241 seconds)

Published: Thu Apr 22 2021