Lecture 7. Graph partitioning algorithms.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right guys let's get started the topic for today is graph partitioning algorithms so we're gonna look into the ways you can split the graph based on a certain criteria and this is you know lecture number one out of series of lectures i'm dedicated to this topic in fact we're going to be using some of the knowledge we're going to learn today later on when we start talking about community detection so what's graph partitioning well graph partitioning is just a split or you know division of um graph into subgraphs by partitioning set of nodes into mutually exclusive groups so for example i can say okay this these nodes form one group and the rest of the nodes form another group and in the sense i partition graph into this one and that one right let's play the graph into two pieces or i can say okay you know this is the nodes for one group all right say the rest is for another group and then this is one non-overlapping set and this is another non-overlapping set or even more i can say okay look this is one set right this is also a valid uh graph partitioning valid split of graph into two um uh you know into two sub graphs uh where nodes are mutually exclusive all right okay so that's graph partitioning now when we start looking at the graph partitioning problem you typically do not just cut the graph into pieces you typically want to um you know satisfy certain constraints or you know certain optimization criteria right you you want to do it for a reason but overall there are lots and lots of ways of of doing this split and um if i want to solve for example for minimum cut and minimum cut will define it in a second but uh for certain optimization criteria quite often this is the exact solution for this is going to be n be hard and the reason for that is just because it's like factorial large number of ways um to split graph into piece into pieces for example if we just think about you know two groups right two groups of nodes um then um you know there are two to the power n minus one minus one ways um to do the split um the reason for that is because um you know each node can be one of two groups so each node has two options um so it's two to the power of n but then okay you kind of want to um excludes the situation when uh um uh you have a uh you know all nodes belong to one set um that kind of situation um and and then you know you need to take into account the symmetry because the two groups it doesn't really matter which group the node belongs to but it's still two to the order of n right and that's just for by partitioning now if we start talking about you know partitioning into more um even into more groups there are even more ways such as factorial large and you literally just cannot go through all the possible options right you just cannot like literally go and check every possibility every split so it's combinatorial optimization problem and typically when you deal with these problems um there are sort of two things you need to do to take care of so first of all you need to choose optimization criterion so like what are we trying to optimize right when you do this split and then you know select the way we do optimization as i just said um unfortunately uh with this you know graph partitioning it's it's it's in p hard an exact solution um you know it's it's integer programming so it's exact solution you know you just cannot do this on large graphs um non-polynomial time so what you do instead is you use certain heuristics you use approximate algorithms sometimes you use greedy methods and then there are other things you need to take into account for example you might want to have a balanced partition so split the graph a sort of in a balanced way balance means the sizes of the parts are approximately equal or you and or not right and the other thing is you might think of you know two-way partition versus multi-way partition so splitting into two pieces or more pieces we're going to address this in a minute um but why would you want to do this kind of splitting of of of the graph well you know this is quite old problem in fact people worked a lot on this in 70s 80s and even 90s um and the reason for that is when you solve a lot of equations on the graph on the grid you want to partition them in between processors of the computers such a way that each processor have approximately the same load the same amount of computations to do but then you minimize communications between them for example you want to find some optimal way to distribute them but they're in this is a balanced cut um this is the same thing happens when you design the lsi um for unbalanced cuts quite often you know this is a way to detect communities and the algorithm we're going to talk about today can be used for community detection or communities groups of nodes that are more tightly connected among them than with the rest of the graph now speaking about you know various algorithms um well there is a greedy optimization um we're not going to talk about it it's it's typically not you know used a lot it actually used to improve um the solution but not used on its own um approximate optimization this is what we're going to spend some time today on and this is spectral graph partitioning in fact it's extremely famous algorithm in linear algebra uh due to fiddler uh it's um you know has a fiddler vector um it's 1972 and then it was sort of rediscovered in 90s and then became very very very popular um after the paper by by by she and malik who used it to solve image segmentation problem so that was pre you know deep learning time and the image was converted into the graph and then you use graph partitioning to segment objects in the image and then there is an algorithm which is called randomized cut randomized mean cut um there is a absolutely famous david karger algorithm and we're going to talk about it simply because it kind of paves the way to many more modern algorithms and then there are some practical heuristics algorithms uh one of them is famous metis multi-level graph partitioning which is just it just works well and if you want to split graph into multiple pieces you know that's usually the software and and the the algorithm you go to you go after now before we continue let me jump back to this picture and show you the following for example let's say again as we as we discussed at the beginning we're going to split this graph into two groups right here and here right now if this is a partitioning that we selected then in fact to get this petition what we could do is um yeah we could cut this two edges right and that will split the graph um so instead of me drawing and selecting partitioning like that i can just say okay look this is a split and then there is this nodes on one side the nodes on the other side or i can say okay look uh you know this is um another split right etc now notice um that when i do this split it's also called cut i cut through two edges and so it costs me two edges to split the graph right i need to remove two edges to split the graph and that's called the cut and this is a cut size or um i can for example um you know do this and then it's 1 2 and three edges so this cut um has a cut size of three or um you know again i can just do this you know cut out that edge and then this this particular cut has a cost or it has a cut um equal to one all right so that's cut um notice that the way i just described it um when i do the scott the sizes of the partitions can be very different so for example the cut i just did right now is very very unbalanced the size of this partition is one and the size of this partition is the rest of the nodes now the size of the partition is usually measured by the number of nodes in that partition right or um if i do for example this cut then the cut itself is two edges right the value of the cut is two and one partition um is of size one two three four five and the rest is the rest right and or if i do it this way oops um oh if i do it this way um you know this will be more or less already a more balanced partition because this size is one two three four five six seven eight nine ten and approximately the same on the right hand side so in all in all these cases the size of the cut was you know two i cut it two edges but um you know the depending on what cut i do you know it can be more balanced or less balanced okay so um this this is these are what the cuts are right this is a cut now how do we use this well um we can actually as as i described as as i discussed you know we need to select the way to optimize things then we need to to find out what we're actually optimizing so if we just split graph into two parts right two you know two sets of nodes v1 and v2 we can define just the graph cut and i just explained it to you that the graph cut by itself it's just um the number of edges right that connects one partition oops with another partition right so if we have this one partitioned another one and there were some edges connecting it right and then we you know cut it well that's the size of the cut that's a graph cut right i and j belongs to different sets v1 and v2 but as we also noticed um you know sometimes you might want to try to have a balanced cut right so you want to make sure when you do the split you know the sizes of the partitions are you know approximately equal right or um you know not not defer too much so to do that instead of just using graph cut we can you know try to normalize it normalize it by um the size of the partition and for example there is this notion of ratio cut which is you know the cut which is the number of edges that we remove when we separate partitions divided by the number of nodes in one partition and so here is another one right to make to make it symmetric now that one works but not really well um here's another idea the other idea is called normalized cut now normalized cut is the following it is the ratio of um again number of edges divided by what's called volume now the volume of the partition is the sum of all edges that starts in that partition so the edge can end in the can have an um both sides in the partition so it can be you know this edge that sits within the partition or it's the edge that starts in the partition and ends sort of another partition that's the volume um and that volume is equal in fact just if you think about it for a second it's the sum of the node degrees within the partition now why is that better why does it work better well in some sense because here we divide number of edges by the number of nodes right um so in some sense sort of apples and oranges here uh we divide number of edges by the number of edges so here is the number of edges in the cut number of edges that we remove and this is number of edges that that are that are emanating from from a partition right so you normalize edges by edges that makes this metric better and in fact this metric normalized cuts normalized cut is also has also very nice representation in linear algebra now finally there is a metric which uh computer scientists love and does have a lot of cool properties it's called conductance or quotient cut and that's it has the following um value it is cut divided by minimum of those two volumes now in fact most of the time we're going to be using this metric or or this metric now this one again it has a very good theoretical properties but it's not um you know not used a lot in algorithms by itself um you know any guesses why you know what's what what can be wrong with this i mean in fact there is nothing wrong with this metric but um since it has the minimum function it's not differentiable right and so then it makes you know it's challenging for some algorithms to handle it so that's only literally reason it's not and it's not easy to um you know represent it uh directly with we say linear algebra so we will use this partially in algorithm to verify things but it's not used as an optimization criteria so moving along um first i want to spend some time on this on the cargurus david carver's algorithm because it's it's extremely simple it's actually amazingly simple and and it works um it's um you know as as sort of part of randomized the family of randomized algorithms you can sort of think um of some similarity to monte carlo methods again um this algorithm actually works on its own but it also is a basis for a lot of other algorithms so the idea is the following we want to find the optimal cut we want to find the minimum minimal cut so let's select as as an optimality criterion graph cut and so we want to find a cut that is the smallest right so want to split the graph into two parts minimizing the cut right that's the only thing we want to do now the challenge here again is there's two to the power of n possible configurations so you just cannot go through all of them right so you need to some somehow find it without checking all of them so the idea of karger's algorithm is actually simple and beautiful [Music] um and it's based on what's so called edge contraction so look at this picture um we have an edge here and then what we do is we remove the edge and merge two vertices into one okay so when we merge vertices you know some edges becomes parallel that's absolutely fine right okay so that's a process of h contraction now carger's algorithm is extremely simple it just says pick up a random edge and merge and and end the contraction and then pick up another random edge and do contraction and then pick another random edge and do contraction and keep doing it until you know you your end up with two nodes and some number of edges between them and this number of edges will be your graph cut right will be a minimum cut now it sounds a little crazy right why would this algorithm actually work whatsoever but you can prove that the probability that doing this type of process following this process this procedure you'll get the minimum cut is this probability is greater than 2 divided by n squared now if n squared is an n of the size of the graph now if n squared is large you know it's it's it's a very small probability right you're not gonna be happy with this but what you can do is um instead of running this algorithm because it's very fast you can run it you know at least n squared time or order of n squared time and then you get some quite meaningful probability um that you get um the right answer so you run it many times and then out of all those you know when every on every run you get your minimum cut and then out of all those different minimum cuts you get i mean you get the candidate for minimum cut right it's going to be the this guess by the algorithm and then out of all of them you just choose the smallest one and that's going to be your sort of real minimum cut and then you'll have a pretty high probability that this is true minimum and uh you know and you only need to run this sort of n squared time so it's polynomial time so it's actually not such a big deal uh versus of course two to the power n of all combinations you would need to to do if you you know try to to search exhaustively so um what does it look like the algorithm itself right um so up you know the algorithm is is the following again you just you know run go through multiple iterations and on each of them you you know you pick the your min cut and uh then you select you know on all the iterations you select the one that is um optimal now and uh what the process is guess mean cut well that process is just as we just described you pick up a random edge and you contract on that and then you return what sort of the pair that left and you calculate the value of that cut so what does it look like in practice well here is an example um our border with this from wikipedia um so we notice we picked up you know it's it's a randomized and each row is uh it's so you know separate run of an algorithm um your you know your randomize here and you know you start contracting so this is first contraction it happened here then for example you know algorithm picks up this edge and contracts it then picks up this edge and contracts it then picks up this edge this edge and and so on and eventually um you know you you left with this two nodes and bunch of edges right then there is a second run then there is a third run fourth run fifth run and notice that depending on the run right because it's again it's a randomized algorithm you know you'll get different number of edges here there are four edges here is i don't even know how many here is four or five and this one on the fifth round is actually found three edges right and looking at this graph yes this is minimum cut on this graph right this is the smallest cut you can get here it's three edges so if i it's found here in five iterations you know theoretically speaking you need more of course to [Music] you know to get this to get this answer especially since you don't know what minimum cut is you just you know do it again multiple times but you select the run that gives you sort of the winner the smallest number and that's it i mean it's beautiful very simple algorithm um easily paralyzable um and again it's it's the basis for a lot of algorithms out there any questions on this algorithm all right so moving ahead now we're switching from this sort of randomized um graph partitioning to more of a lingering algebra approach to graph cuts and this is the algorithm that proposed by fiddler in 1973 and then again reinvented the 90s the idea here is the following um and i'll go a little slow here so you can you you you will cage the idea um so let's say we split you know we have partitioning right it's called you know two two we split the graph we want to split the graph into two parts and it's v plus v minus partitioning of all the nodes and um let's say we have a line graph right the way you know i showed here it's a line and what we're going to do is the following let's say the partition that we have is right here and these are nodes belong to so one partition and these nodes belong to sort of say plus partition and this is a minus partition now what we want to do what we can do is for since it's partitioning right it's the positioning of the node into c dividing nodes into two groups for each node belonging to group we can assign an indicator and let's say the nodes that belong to group that we call say minus group has indicator minus one you know label and those that belong to the plus group has a label or indicator plus one okay so it's just an indicator vector so the vector has is is the length of the number of nodes in the graph and it has minus ones for those nodes that belong to one partition plus ones that belongs to another now let's go back to the definition of a cut and remember cut is a number of edges that you know crossing uh that that crossing the partition right what's interesting is um what they're gonna see now is that you can express the value of the cut through those that indicator vector look if two nodes belong to the same partition then their values are equal to one or two minus one and then the sum is equal to zero now if nodes i and j belongs to different separate partitions then one of them is for example equal to one and this is equal to minus one and so you add them up it's becoming two squared is four divided by four is one or if it is minus one plus one the same thing so so this this sum in fact the sum is calculating the cut it calculates how many um edges crosses the cut and then um here i sum over all the edges that crossing the cut well you know instead of doing that um i can actually just put in adjacency matrix i need to divide by by two because you know take into account the symmetry uh it's undirected graph and then i can simplify this a little bit simplify it a little bit i bring in the chronic delta symbol which is equal to 1 when i is equal to j and 0 is when i is not equal to j um and you know keep simplifying and i get this expression a i j is adjacency matrix k i is um the the node degree of of node i delta i j is equal to 1 when i is equal to j and 0 otherwise and s i and s j this is our values now you can actually write this because look delta i j if i if i write this in the matrix notation it really says that it is zero everywhere except for the diagonal and on the diagonal you have a uh degrees right so it's going to be k degree of one node degree of another node and degree over last node and this is the matrix itself so you can write the cut in terms of linear algebra as the following expression dij which is a diagonal matrix with degrees on the diagonal and aij is the adjacency matrix and s i asj those indicator vectors in the indicator vector and the values take plus one or minus one and when you add up the sum you get the value of the cut all right so in terms of linear algebra cut can be expressed this way does this make sense okay so next step so so so what well um there are a couple of things first of all this matrix this matrix has its own name and let me go back for a sec so this there is a matrix here d i i j minus a i j so it is diagonal minus adjacency so this matrix um has its own name and it's called graph laplacian and uh it is you know diagonal minus adjacent matrix and it can be just specified by itself is you know you take on the diagonal you have a node degree you have minus ones of diagonal it's unweighted adjacency matrix and you have zero everywhere else so if i actually look at this graph itself right um you know you could have an adjacency matrix for this graph which is um you know it's going to be only once for the nodes that are connected and then the laplacian matrix looks the following on the diagonal you have node degrees right the node degree of node number one is one node degree of node two is two node degree of degree of node three is two degree of node four is two etc and then it has minus ones for h all right so this is laplacian matrix and it's actually very very useful if you want to study graphs so laplacian matrix or graph laplacian now the way then we can write down the cut in terms of this matrix and it can be written this way or if we use notations vector notation matrix notation it's just a quadratic form so if you think about this this is a vector s then there is a matrix and then there is a vector s right so um the you know you multiply them through you realize you get a number and this number so it's a horizontal vector like row vector matrix times column vector and you get um you get your number oh my battery is not working um so so you get you get this you get a number now ima remember uh we started by looking into the problem of you know optimizing cuts and so in this case what we can do is we can look into we can look into optimizing um the value of the cut and then is going to be a minimum cut problem right so then minimal cut is trying to find minimum of this qs value now what we want to do also is we want to try to have for example a balanced cut so want to make sure and balanced means um we we want to have it we want to have the size of the graph the size of each partition approximately equal and for that to happen um i want to introduce a constraint that's going to be adding up all our values and equate it to zero now remember if the node belongs to one partition it is plus one plus one and the nodes that belongs not the partition it's minus one and minus one so and say another minus one so if we want to have more or less balanced cut which means the sizes of those two groups are equal then the number of pluses should be equal to the number of minuses and if you add them up they should cancel out and become zero now it will not necessarily be always zero because uh you might have say odd number right but you know you want to try to push it all right so now what we have done is really we just expressed this cut problem as integer minimization problem right so we said okay we have a function we want to optimize and we want to find the value we want to find this s vector the vector that consists of ones and minus ones such that this value is the smallest and these are the constraints so it is an integer minimization problem okay well honestly it's not helpful to us right because again integer programming it's again and be hard so we just expressed um the the graph partitioning problem as an integer programming problem but then there is this trick that fiddler proposed that you know changed everything and the idea is so-called spectral relaxation and the idea is to go from this discrete optimization problem where the vector of those um s's um plus one or minus one is actually replaced and relaxed to be continuous problem and those s's are replaced by you know just x value that x can be real number so we replace discrete with continuous now if you remember going back several lectures uh when we talked about um say we looked for some distribution functions and uh i uh you know we kind of did this trick of going from a discrete summation to continuous integration because it was much easier to work with continuous variables well here it's kind of sort of the same idea that instead of working with discrete variables we go to continuous but here it also has a sort of incredible makes its incredible difference because in discrete variables this isn't hard and continuous it's easily solvable right so we couldn't solve it in integer programming so the idea would be to replace this integers with a continuous variable replace s's with x where x values can take you know any values um doesn't matter um not necessarily plus or minus one but you know sort of whatever you get there and then at the end we can sort of round it up to plus or minus one and that's called the relaxation process so um we replace discrete precise exact uh you know discrete problem with approximate continuous problem and then instead of this integer programming integer optimization of minimizing this we will be doing that right so you would say what the difference is all the same well the difference is that s is integer right and can be only plus or minus one and x is a real number right and s belongs to integers and that makes world of a difference because again this problem isn't hard this is easily solvable and then um as a sort of after you know after the solution we will just take for example the sine of x and convert it into s so we take the solution and take it back to integers now i'll pause here um questions does this makes any sense to you guys you know in terms of math this is probably the most challenging lecture all right i'll continue so but now you know we actually you know have a home run because now it is it's it's a you know constraint optimization problem so we want to minimize quadratic functional right it's x transpose lx under the constraints well how we do this yeah there is this absolutely standard way of doing so uh by let range multipliers so you know to find the solution you know we'll take this cons this um the function that we want to optimize and then you know we put constraints into lagrange multiplier and we'll keep constrain this one constraint outside it's just easier to do it that way now if you remember when you have this lagrange multiplier what do you do um to solve the problem how do you guys solve the constrain optimization problem with lagrange multipliers what do you do okay maybe my sound is off or nobody is talking so how do you solve a constraint optimization problem all right so what you do is you just differentiate it right what you do is you have to take a derivative derivative with respect to your lagrange multiplier and another derivative with respect to the variable that you're solving for now derivative with respect to lagrange multiply gives you back the constraint and derivative with respect to the variable you're solving for give you an eigenvalue problem take this and differentiate it you get lambda x differentiate by x transpose you get lambda l x on the left hand side you get lambda x on the right hand side and that's pretty much it now if that's the solution you put it back into here and your answer gonna be is this now um what's interesting that for laplacian matrix for laplacian matrix the the solution that is the smallest one the smallest eigenvector is actually trivial solution now um it's it's solution of all ones and i'll just give you an example here um if we go back for a sec towards this matrix notice that if i multiply this matrix by vector of all ones then what do i get there zero zero so you're gonna be zero time you can do zero times all once so which means all ones is an eigenvector with eigenvalue zero right and well but that's sort of the way the matrix is done by design notice that the sum of all the elements on the on on each row is is equal to zero and so so that's why it works so because of that um the smallest the first smallest eigenvector right is zero and i'm sorry i can value zero and smallest eigenvector is all ones but notice we have a constraint saying that our the solution has to be a perpendicular to unit vector right it has to be perpendicular so this solution doesn't work for us so um since we're looking for the minimum right for the we want to minimize this functional and this smallest one right doesn't work what do you do well you take the next smallest and so it's a second eigenvalue and second eigenvector and that you know that solves our problem and so this second eigenvector second eigenvalue is actually called the fiddler vector and it's used uh for the spectral graph partitioning so it's second smallest and there is the second smallest because the first smallest eigenvalue is equal to zero an eigenvector is trivial with all ones right and since it all once it would mean that um there is only one there is no partitioning there is only one partition right because remember eigenvector has a meaning of an indicator to which partition the node belongs and if at all once it means all the nodes belongs to the same partition and that's why you know it's not interesting for us so that's why we're looking for the second smallest um eigenvector and second smallest eigenvalue corresponding eigenvalue all right okay so here's an algorithm it's called spectral graph partitioning and you know what you do is you literally take your adjacency matrix take your adjacency matrix you know find the diagonal compute laplacian matrix and solve for the smallest eigenvector and that smallest eigenvector then you set up indicator as sine of values in that smallest eigenvector and that's your solution that's your partition those that will have plus one will go to one partition those who get has minus one will go to another position it looks like magic but you saw the math now what's also extremely interesting that instead if instead of this equation you're solving that equation right so um this is the same eigenvalue problem except for here you got diagonal matrix one can show that instead of minimizing the cut you'll be minimizing the normalized cut let's go back for a second to the beginning of the lecture so if we just solve a simple eigenvalue problem we're minimizing this cut and if we solve the eigenvalue problem which is l x lambda diagonal matrix x and here we're solving l x where l is laplacian um equal lambda x okay so the first one gives us cut the second one gives us normalized cut i didn't show it here but you can you know check in the papers how to actually derive that now why is this important why is that important well because if i look at the partitioning itself and uh let me go back to you know this picture cut means number of edges right and in fact this is a very good cut because it's equal to 2 but if i run an algorithm that want to find minimum cut it's not going to find this one it's gonna find the minimum cut and what's the minimum cut here okay remember cut is a separation of a graph into two parts um and it's just number of edges that connects to parts and minimum means i want to find the separation that cuts the minimum number of edges so in this graph what's the minimum cut the left downside yeah so i can cut out one note it's a cut the value of the cut will be one and that's going to be minimum cut but this partition is not really what you want because you know you just cut out one note so you really done almost nothing and that's why it's very important to actually have this balancing constraint trying to make sure that maybe you get the cut the number of edges is not the smallest but um you know it it kind of balance partitions right the size of partitions are similar and for example this value of the cut is equal to two but it's much more preferred cut for you because it balances the partition and um looking at the formulas you realize that yes this one normalized cuts normalized cut it does tries to instead of just optimizing the the size of the cut it optimizes the size of the cut divided by the volume which is number of emanating edges so it tries to balance it and so this is much much better much much better function to to optimize all right so going back to the solution now that's an algorithm now let's look uh how it works now you have seen this graph before this is a karate graph i want to find the partitioning now this what you have here is the following on the x-axis it's just a node number right it's a vertex and node number the first node and the first node the 30th note 31st the 32nd note right um 33rd 34th right there are three 34 nodes here okay so these are those nodes and what i have here on the vertical axis is the value of that x vector so i solve we solve this normalized cut problem and this is a vector so for every node there is for every node um there is a value right node number i don't know 30 has this value in this eigenvector node number 34 has this value is negative vector node number 10 has this value in eigenvector node number one has this value so it is minus 0.2 and here it's 0.2 and and and etc does this picture make sense yes okay now what do we need to do if we want to divide this if we want to partition this graph into two sub graphs based on this laplace matrix based on this algorithm what do we need to do guys remember indicator vector s we defined indicator vector that tells us which partition the node belongs to is defined by zero right exactly it's just a sine of x so what we need to do is we should say okay look everything every nodes that every nodes that has values um you know above the zero line right above the positive they will eventually they will belong to the partition you know plus and this belongs to partition minus okay all right so let's see how it actually works i i'll just do exactly this um i color coded based on that and notice what happened it actually nicely split the graph um and we're going to use underneath at the results of of community detection this graph later on more in the following lectures but here is you know it's actually found the split all right so that's how it worked now but then there is actually even more interesting story here um that makes this algorithm actually quite fascinating so this is a second eigenvector right this is a picture we just saw let me do this let me take this vector and permute it such a way that um you know x i plus 1 is greater than equal than x i so what i want to do is i want to renumber the nodes such a way that you know the further the node the larger it is the largest the value of eigenvalue is so i take the eigenvalue vector i sort it and then renumber the node based on that sorting based on that ordering is this clear what i'm trying to do okay so i want to permute it and then very interesting things happen if i do that then this is a picture again notice that we in fact got the gap here right so far in our story that was you know one cluster one partition and this is another but we also got a gap here so there's a gap here which is obvious for us that this is split but there's also a gap here so there are a couple things happens one thing is that um we can actually assume i put a hypothesis that in fact we should not only just look at um just at at the zero at zero split but we might look for the gaps and in fact this is a one-dimensional embedding and those you know of you who are big fans of deep learning realize that what we have done here is literally sort of principal component analysis type of out in quarter but just instead of just using sort of typical usual matrix we used um the blaster matrix so that's one d embedding in fact but you know discovered in you know 70s all right now what do we do with this knowledge well uh before you know actually let me before we go further um let's kind of verify with this cut metric notice that if i order the nodes right let's say i sorted it i can in fact now very easily say the following i can say look let's split graph here and make sure let's say the first two nodes will be one partition and this will be another partition i can do that and when i'm when i have this partition i can measure the value of the cut right i can measure how many how many nodes i you know how many edges i cut here so literally like i'm saying is um you know take out let's say you know the the partition is node one and node two and the rest and let's calculate how many edges i cut and i can also calculate normalized cut and all other metrics and then i say okay let's take node one two and where the hell is three so i can take these three nodes as one partition and the rest and then i can say and measure and measure the cut right and then i can say okay let's do this as one partition and the rest and that's a cut right and then i can you know go on and on and on and say okay here is my other next cut right and calculate the cut value so if i do so that's what i'm going to get so this is a node right this is and and here when i do a cut that's the value of um of of you know cut function cut value so here is it's if i split it here that's going to be the value right if i split the graph in here that's how much a cut will be if i split the graph here so you know take one partition another partition that's the value of the cut and by looking at this it's obvious that the smallest value of the cut will be when i split the graph here in those two pieces and that's going to be the optimal the smallest cut value now but if i look at normalized cut and this is the function we're actually optimizing you realize yes again this is this is the best place to split right this is the smallest value of the cut right this is sort of the optimal partitioning this is going to be one group of nodes this is another one again this picture is possible only because i permuted the nodes and i put them in in sort of in a particular order right then there is this ratio cut and this is conductance by the way notice that conductance has a very very sort of sharp minima which is good property but also notice that there is something happens in here so there is a possibility that there might be a sort of another good cut or another cut right there okay so that's pretty much how algorithm works right and that's just proves to you that yes it really sort of allows you to see and allows you to find the the minimum value um this was a cut into two pieces right sort of it was two-way cut now if i want to cut in multiple multiple uh sub-graphs you know what i can do is i can actually do sort of the the traditional divide and conquer approach where um you know we take the graph we split it into two pieces right let's say that the cut and then work separately with left side and then separately with right side so we run algorithm first time first here then here then the third time here and if i want get smaller pieces you know i just keep cutting and in fact this is how you determine how you find communities or clusters you take this split and when you split the first time you know you get two large groups of nodes which are not really clusters of communities but when you keep descending keep going down down down more you eventually get two small groups of nodes that are very tightly connected to each other and that if you cut them it will cost you a lot so you stop cutting that's called multi-level spectral partitioning or recursive partitioning and that's what in fact you know it's it's actually one of the most powerful algorithms um for community detection or for for bipartite graph partitioning now what's also interesting is is and and uh sort of eliminating is this picture let me explain it slowly i hope you will understand because this is honestly when you have a large graph this is honestly the best way uh to see with your own eyes um groups of nodes the clusters in the graph so let me explain what you have on this first left graph is just their adjacency matrix of a graph right and it's pretty large and you know blue color just corresponds to an edge in the graph and so you realize that you know there are a lot of edges everywhere now i run um this normalized cut algorithm once and then i take this eigenvector that i get and the same way as before i permute the nodes in such a way that they now sorted in the order of increased values with an eigenvector and that's what i get now how do we interpret this look it just means that the nodes from one to whatever let the i'll just make it up to let's say 100 they're all connected to each other this very much connected deforming a cluster and then there is this set of nodes say from 100 to i don't know 500 that are also connected with each other and they're not connected almost not connected here and barely connected to the rest of the graph and then there is this larger group of nodes that are also connected among themselves but almost not connected to these guys so in some sense if i want to do cuts well this is sorry this is my cut and this is my cut right because that will split the graph into one two three groups there are a lot of connectivity here a lot of connectivity here and quite a few connectivity here but what's interesting is if i continue working on this the same way on the multi-level as as we described as i described the multi-level partitioning scheme and keep splitting then after a while i get to this picture so what happens here is these guys are not changed but you start getting even more details on this picture so what was happening is only that part by solving again i get another problem there and reordering the nodes and you start seeing more and more clusters and so what what it does is this visualization it actually allows you to see clusters or groups of nodes that are tightly connected within the group right because this connected only to the node within the same group and not connected to other so this embedding or you know this one-dimensional embedding and then permuting based on that embedding allows you to very clearly see clusters that exist in the graph group of nodes that are connected among themselves again remember this block really means that this nodes connected to among themselves right because on both x's we you know it's it's node numbers right and these nodes are connected among themselves and this group um connected among themselves so that's um that's the power of of laplacian algorithm again laplacian embedding um lash and graph partitioning it's not only the fact that you can find the split and divide the graph into two pieces but it is also extremely powerful visualization technique where you take the graph you permute the notes you reorder them based on the values within the second eigenvector and then it illuminates it shows you on the matrix visualization nodes that are next to each other as blocks is this clear sort of clear all right okay good um i think you might have similar assignment in the homework of actually you know using not not doing multi-level but um just single single one level um it actually becomes much much cleaner and clearer understanding if you actually tried you know once yourself on the data um and just as as always the references again you know just sort of curiosity wise this is the original fiddler paper on on on the swedish vector second eigenvector and then it was kind of recovered um for partitioning matrices in nineties um and then there's this sort of famous paper in computer vision that implied it to us image segmentation all right okay and that's it for today thanks guys if there are no questions and remember there is no seminar today and we're done
Info
Channel: Leonid Zhukov
Views: 1,182
Rating: 5 out of 5
Keywords:
Id: zZae_C2BU_4
Channel Id: undefined
Length: 63min 57sec (3837 seconds)
Published: Wed Feb 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.