Lecture 39 - Spectral Clustering (05/08/2017)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right good morning welcome back last week huh all right so we are this week the plan is to just wrap up the last last topic activity today and then we leave the last two classes just to do sort of a review and also just to give you a brief overview of what we have done in the entire course but what I also do is I will start a sort of a gradient or sorry opiods up thread where you can post what topic or a particular gradients problem that you would like to look at okay and of course the tears are going to be here whole week so if you have questions you can go there for the undergraduate of course the recitations are there this week and they will primarily be doing the finals review right so please make use of that as well next week we won't have any office hours unfortunately we have I am out a couple of days and then of course then today's exam but we'll be there on Piazza so if you have any questions please post there try to make them public post so that everyone else can see them as well all right any questions before we start so the final exam is on Wednesday at 8:00 8:00 a.m. here in this class so three it it's a three hour example will be longer than the midterm but the same format and it's not comprehensive so whatever we started looking at after the midterm will be covered make that way next lecture yeah all right so okay so there's no question let's get started so I just wanted to wrap up our discussion on singular value decomposition right so SVD is not like a core machine learning technique as such right what's used as we saw to do PCA which is a dimensionality reduction method right that's why we looked at it but then we also saw how it is useful for computing this low-rank approximation talk data right it would be any data we have been looking at images so images have a you know rows and column thinly you could have a data matrix X it has n rows and D columns and you can do as really on that what that gives you is another matrix with same dimensionality but because of the SVD theorem what it says is that the difference between bit and the original matrix will be minimal among all similar kinds of low-rank approximation all right so that was the idea and then we saw how to use it for certain things but today and then I kind of mentioned how it could be used for the as a recommender system right so recommender systems are sort of this class of problems that you want to predict you want to recommend items to people so places like Amazon and Netflix do that so based on what your purchase history they want to predict or they want to recommend more items maybe you want to buy this or that like so one way to do that of course is that it's sort of this what do you call data collaborative filtering approach where you say okay I have bought these items right and there are the other people who have also bought those items so they are similar to me now they have also bought some exciting so maybe I should like to buy those items as well so that's one approach but you can also do this solve that problem you can look at this problem as a matrix filling problem so let's say you have a matrix like this where you have as rows people and as column some products and there is a one if this person has bought this product right and now what you want to do is you and then there's a there's a blank if that person has not bought that product right and oops sorry and what you want to do is fill these entries so that's why we call it like a matrix filling problem because you want to say it should Jim would Jim be interested in buying a TV or not right so if I look at this data as a matrix so here is the matrix that I want you to focus on so this is a binary matrix with ones and zeros I can do SVD on this take a load an approximation of that and then reconstruct my matrix right and what you will see is that some of these videos will become one and then you can say hey let's predict this in fact as if you're familiar with this Netflix challenge it was a sort of this online challenge that was offered almost any ten years ago now it's a Netflix just gave out some of its data saying here's all the people and these are the movies they've watched and then you had to sort of recommend more movies right and there was a way to evaluate your algorithm so so the very first thing that people did was read E and they did pretty well and then of course people did more than eventually they won for money but everybody sort of is sort of the first thing you would do right so this is how we do it so we have the data remember this was my XV that matrix and I do SVD on it and if you remember that SVD gives you three tanks U which is the left singular vector matrix F which is a call if it is essentially a vector which forms your F matrix which is or it has all the singular values in it and then V which is your right in your matrix right and the first thing you want to say is okay how many singular values I need to consider so for that we essentially plot all of our singular values our square of similar values and that gives you what we call a elbow plot this is very similar to the scree plot for PCA which essentially tells you okay if I use the first singular value how much error would I incur in the approximation right so of course if I don't use anything that I incur the full error but it seems like it turns out that if you just use the first singular value you if you do a pretty good job so if you keep increasing the singular values you of course will reach zero but it seemed it turns out that at one is good enough so what you could do now is you can truncate your SVD remember we also call it the truncated or thinner 3d where we only consider the first few single vectors so one way to do that is I just make all the singular values video because eventually you'll be multiplying these things right so we multiply to get back the matrix we must buy you times s times V so if I make some of these entries 0 these will not figure in the eventual computation so that's what I do I just make I just take the top two entries and then everything else 0 and then I can compute my new newest a new X so this is the reconstructed at you tilde s tilde B tilde and that gives you the new sort of filled out matrix so if you look at that so what you'll see so this is a new matrix and what you'll see is that some of the entries now are one which were not r1 earlier so for example this one too and then you can say okay maybe we should predict this recommend this item to this customer let me see one example for you this is a very small you might not be too drastic a change but yes sometimes might also turn someone is veto because it is just a mathematical operation it kind of tells you that hey this person should not be buying this but you ignore that because person already bought it right but anyway I'll leave it to you to solve with this one so for example this is a one that was not there before right which means that this person whoever this was could be recommended this particular item right so Karen could buy peanuts exactly right so that's right here now you'll also see some other issues for example as we saw that some of the ones became zero which is something we don't like you might also see now here of course I have truncated everything but sometimes you might even see negative values right which is something you don't like either right because what does it mean that somebody should be recommended negative value so of course SVD does not control that in SVD you do not have any control on these values to be positive or negative it is just a optimization problem right however there are other methods to sort of handle that to something like a non non-negative matrix factorization is another method that we are not going to talk about at all but the only reason I'm mentioning it is that if you want to force your output to have only positive values because positive makes sense then you might want to explore NMS non-negative matrix factorization which is also similar to a Swedish so it is also does some kind of a matrix factorization but you put extra constraints on that so you put a constraint that all the values should be positive and in some way we know how to do that because it's an optimization problem with some constraint so you can in principle solve them using like a Lagrangian method or something similar all right Tibet is idea so that is how we use a sweetie for record as a recommender system right what else oh we are going to talk about three and then we also of course saw a 3d other way of approximating your data any questions so far before we switch to another time alright so now I just want to talk about one last topic in the whole course which is called spectral clustering there's another clustering method the reason I didn't talk about it when we this came ins was because it I mean it's a objective it's still the same you have data you want to partition it into groups but the way it does that is what we call a spectral method and since we looked at spectral method so for last week or so I thought this would be a nice continuation to that so what do we mean by spectral method is whenever you're doing any kind of eigenvalue decomposition right so then those methods are also referred to as spectral methods because you're looking at the eigen value spectrum of a matrix so that's right so even here we will see that spectral methods are trying to do what we call sorry spectral clustering also uses some form of eigenvalue decomposition to to get your end all right so to think about it this way we have looked at three types of machine learning I mean three ways of solving machine learning problems one is error based methods like neural networks where you define an objective function and then you apply some kind of an optimization procedure to get your answer second is a probabilistic method where you're trying to maximize some kind of a likelihood or a posterior to get your answer and the third is where you pose your problem as a eigenvalue prop right so for example we saw in PCA that we wanted to do this latent factor and ad embedding and we saw that if you want to find the those principal component directions by maximizing the direction or maximizing the variance right that turned out to be similar to solving an eigenvalue clock right that's why it's the spectral method so that sort of think of it as a third way of doing machine learning now not every problem can be posed as an eigen drop right so that's why you don't see a spectral neural network for instance but in many cases especially whenever we are talking about this latent factor embedding where you have some latent factors and you have original data so those things can be posed as an eigenvalue problem and then you can solve it using eigenvalue decomposition so spectral clustering is another way another application of that alright any questions yes right yes yes yes so right so so the question is that when you said this human spectral method right so our spectral methods their own machine learning problem a girl's or are they sort of tools that would help you in coming up with machine learning things right a solution so the answer is you can use things like pca right as a pre-processing step of course so you can so any dimensionality reduction method can be used as a pre-processing step and then then you can do more machine learning things on top of that so that yeah that's true but sometimes you might just want to do that pre-processing itself for example in clustering right so you just want to find the clusters in that case it is a machine learning task by its by its own right but what spectral cluster spectral methods let you do is solve a machine learning problem using eigenvalue decomposition so that's the way you want to look at it so you could have the same problem you could have a clustering problem right so the clustering problem is oblivious of what kind of optimization you would do underneath now you can solve it using some kind of a you know a probabilistic approach like mixture model right you can solve it using by optimizing some kind of an error function like k-means that in iterative in a way or you can solve it using a special method like we will see today so think of it as another sort of tool in your toolbox to solve some of these machine learning problem all right any questions so far okay so spectral clustering as I said is another approach to do clustering where the idea is that you can be you represent your data as a graph alright so we haven't talked about glass at all in this course but think of a data matrix in which you have n data points right and let's say all of these data points are in something d dimensional space you can construct a similarity matrix so is this you can also relate to a kernel matrix right so where's every entry so it's the M cross M similarity matrix every entry is essentially some kind of a similarity between a pair of points so si si J is sin x I comma X J and sim is some similarity function and it could be anything that you want to use it needs to be symmetric of course so the idea is that you can construct a graph so for example you can take the cosine distance between two vectors or you can even take a negative of Euclidian distance that will become like a Euclidean similarity right so you can construct this matrix in many ways but what spectral clustering method to do is they start from W this matrix oh sorry all right so there is one more step so first you can serve with s then what you do is you construct this at the same C matrix so I'm assuming you have done graph theory a little bit right so whenever you have a graph you think there is a adjustin C matrix which denotes which nodes are connected to each other right so every graph has nodes and edges and then it has and that evidently matrix kind of encodes the edges so which edge is connected which node is connected to which node and these methods assume that this is an undirected graph which means that you know there is no direction in the edges which means that your adjacency matrix is symmetric and it is also greeted which means that it's not just one and zero so it's not just that you are connected or not but it's a weighted undirected graph it means that you have some value to the edge some weight associated with the edge now this graph essentially connects two edges if all connects two nodes if those two points our nearest neighbors of each other all right so that is one way to construct this graph so for example what I could do is I can consider this is my data right so X 1 X 2 I have X just for data point right because the first thing we do in spectral clustering is construct with s matrix which will be 4 by 4 right and each entry will tell you the similarity and that's how a similarity function gives the value between 0 and 1 right so and of course the similarity between with yourself is going to be highest so that's 1 and then everything else will be something so at the point seven point three point four and symmetric so so these are just some arbitrary values right so this is our s matrix from this what we do is for every data point I find out what is who is my closest neighbor right and then I construct a new matrix I call that W and of course everybody we are going to ignore any self-loop so in our FJ fancy matrix nobody is connected to themselves and so this point the nearest neighbor for this is the point number two so we'll connect that alright in fact we'll connect it and we'll call it weight as the similarity and everything else will be zero and here it will be point eight zero zero point sorry zero point eight zero and zero point six here so this becomes am I glad right so if you could if you think about it in a sort of a graph format that say this is my node 1 node 2 node 3 and node 4 1 2 3 4 right so 1 is connected to 2 2 is connected to 3 3 is connected to 2 well that's true that's it and 4 is connected to 2 as well so this is my graph right and these are the weights here I we symmetric as well something bad all right so so there are Period three edges right yeah so this will be ogre right so what spectral clustering methods do it they take your data and they convert it into a graph and after that they will do some kind of a graph clustering so if you had your data witches are already a glass-like then also spectral clustering methods could be used except that you will now start from this step rather than this step you don't have to construct the graph because you already have the graph another interesting thing about this is that you could think of it as a kernel method as well because as long as you have some way of computing the similarity you can construct this graph so the each of these axes need not be in some vector space as long as there is a similarity metric defined for a pair of objects you can construct this class and that's where you start all right so that is the idea behind this now how do we cluster it so what spectral clustering methods do is they pause the problem as a problem of graph partitioning so what is graph partitioning is that if you have some nodes you want to group the the the nodes into certain partitions such that nodes in a partition are connected to each other but they are not connected to anything outside right so it kind of looks like a specular cluster so for example if I go here so in our notebook for spectral clustering so in this notebook I use a very interesting library called network X so Network X is sort of a graph analytics library in Python so what that lets me do is first of all it lets me create glad so let's say this is my graph alright so this is our just empty matrix and this is the graph so here I'm assuming I already have the graph but of course if you had data then you would follow the procedure that I just explained to get to this graph but let's say we have this graph and here we are assuming it's unweighted so it's either 1 or 0 but that need not be case right so now what does spectral clustering do spectral clustering tries to group points nodes together such that things in one node or in one partition are all connected so for example there was another exam so let's say this was my grandmother this one ah let's say this is my class right it has 12 nodes so what you would like to do is you would like to create three partitions one in which we have one through five one in which we have eleven to thirteen and one in which we have six through ten right why because if I partition them like this all the nodes here are connected to each other fairly heavily and they are not highly connected to any other partition same thing with this one and same thing with this one so that would be an ideal partitioning of course you can partition it in an other way as well I could put two and ten and eleven together but that's not a good partition because they are not connected to each other at all and they would be connected to other partitions too much so in spectral clustering we kind of build mathematically formulate this idea right so this is the objective function for spectral clustering also known as a grass min-cut problem so in algorithms I'm assuming in your 250 you might have seen this so if you have a graph a min cut one mil cut is the cut is cut that you put on the edges such that things that are on the either side of this cut are not to connected with each other right so you want to find the minimum number of edges such that if you remove them your graph will have two components so similar idea here you are trying to figure out how to partition my so each a K here is a set of node right and you're trying to figure out a cut like this such that this is minimize so what is this objective function so for every partition it's trying to essentially measure how many edges are there outside so how many edges are there from nodes in a k2 notes outside pic so you you're trying to and of course there are weights that those edges but we are trying to minimize that weight so that's the grass cut problem and because of the way we constructed this graph but you can clearly see is that this will give you an obvious way of doing clustering right so that is idea any question so far question is what if there are more than two partition this would still work because here actually this formulation is trying to find k-cup if there are K Kies partitions it will if you are able to minimize this then you will you will find all the k package and in fact what I will show here is that this particular algorithm we'll give you some more guidance as to how to choose this case remember this case always sort of this thorn in our side that how to choose the scale right and this one kind of gives you a little bit more information than clustering - all right so everyone is clear with this about this objective function right so all you're doing is you're saying if this is my partition the way I scored this partition is I take all the edges that are going from this partition all the nodes within this partition to outside and then I'm adding them up and then I am doing that for every partition and adding all of those up and that becomes my score and I want a partitioning which would be essentially clustering which gives me the minimal all right now one thing you might see is that let's say you only had two two notes right so one way to optimize this would be that you put only one node in one partition and everything else in another partition and that that would give you a small value right and but from the clustering perspective that's not what you want so what you want centrally is some more non degenerate solution so you formulate this normalized middle cut problem where you also divide it by the size of your partition so you also want large partitions so you want large partitions as well as no edges across partition so this this is called a normalized min-cut problem that you are trying to minimize sort of this this fraction some of the splash all right so all right so if you want to solve it this is a class of problems in computer science which we call np-hard which means that it's not you cannot find a solution in polynomial time to this right you can if you have a solution you can check if it is true in polynomial time right but here but you cannot find it in polynomial time which means that these are not you cannot come up with an efficient solution to solve this what we can do is we can solve sort of an approximate version of this where what we do is we say that let instead of finding sort of this partitioning exact partitioning if we just find some weights so that for every node if you just find okay what is the sort of the score or probability I don't want to say probability because not really guaranteed to be sum of guarantee to sum up to one but think of it as some kind of confidence so what it's saying is that instead of giving these kind of absolute values where we say Pik is one if I belongs to cluster K right that becomes a hard problem to solve that that's and you can reduce it to zero one knapsack problem which is known to be np-hard right but if instead of that we say everybody can belong to every partition with some score right then that could be solved and in fact what we can see is what we will see is that it becomes an eigenvector problem so instead of trying to solve it as an optimization problem we can pose it as an eigenvector problem and then solve it and that's why we call it spectral class all right so let's look at that so to do that we first introduced this new a concept of a graph laplacian okay that's another French word after Lagrangian that you see called the luckless the laplacian or a graph laplacian you you so let's say you have some a matrix right off a graph dr. right so W is just the N cross n matrix and it is going to be for now let's assume that these are all just 0 or 1 but it could be anything so these are of course these are all zeros and then there is their weights here for each H so that's your W let's say we construct this matrix D B is a diagonal matrix so it has zero here and just values here such that dii is just summation the blue ijj equal to 1/2 it just comes up all of your values in a row and puts it in the die so that the definition of D alright so you can think of each entry here is just tell you the degree of this node right so let's say so this was just all one so if you add these up this will just tell you how many nodes was this not connected to right even if these are a bit then it tells you the beta decays so V captures the degree of your nose right so what else is the laplacian l is basically d - step that's it right very simple so it's the difference between this diagonal matrix which kind of tells you the degree of each node E and minus W which is the original adjacency matrix so there are some interesting properties of this else right so one thing you will see is that this L all the four else properties of this al each row some like that easy to see right because when you sum any row since we this part will come up to the sum of this row which is in this actually when you just take the sum of the row it will be seen that easy to understand like any questions about that part okay so Ichiro sums up to zero then another thing is that if you do the eigen decomposition of this matrix right then you will see that Elle has one eigen that value equal to zero okay and one eigenvector and the corresponding actually corresponding eigenvector equal to all one or a constant value throughout right so this is also easy to show because if you think about it right let's say so what does this mean that Elle has one Lycoris 1-0 it means that if you take L right and multiply it by this vector 1 right this will be equal to 0 times this vector 1 right so because an eigen eigen solution for any matrix essentially says that LS U is equal to lambda u like that's not idea behind eigenvalue now you can see that this will be true why will this be true is because if you take L right so if you take any matrix and you multiply it by a vector of 1 so what will happen here is that this so the answer for this will be you will first compute this times death right and these are all one so all you are doing is essentially just computing that some of this row so this will be just the sum of the one some off row two low-end right and we know that since its earliest each row sums up to 0-4 else right which means that this will just be a vector of zero the same as writing zero times a vector of one right which means that and zero at the eigenvector eigenvalue for at least one eigenvector has an eigenvalue of this laplacian will be zero and any questions so far all right good so this is another property now one more property of this is that if your graph has K connected components so I didn't talk about components but if you have any graph right which looks like this and there are some edges you then we say that this graph has three components three connected components okay if you have a graph which is like this let me say this as connected compound because this is the to see how many you know group can you make sure that there are no edges across groups right so the idea so one property of laplacian is that if your graph or your w right the graph that is represented by your W matrix has K connected components then you will have K eigenvectors which will have zero eigen value okay and that is something that you can prove as well as I don't think I'll go into that but that's a similar discussion as just this one so these are the five property that I've listed here first is each row sums to 0 that's clear second if one is an eigenvector with eigenvalue equal to 0 its symmetric and positive semi definite that also you can show so symmetrical easy to show because W was symmetric like so D minus W will also be symmetric it is positive semi-definite that is something that you can show because all these DS will have only positive values because they are summing up some degree right and W metric matrix right so if you work it out so if you take any vector X multiplied D minus W by X you can show that for any X this value will be at least greater than or equal to zero so that sort of easy to show but will not go into that today but what this helps us do is that because it is symmetric and positive-definite so you can compute the eigenvalues for this bike so it has an M non-negative real-valued eigen bad that's a prop that's sort of a linear algebra sort of result that takes you from three to four so if your matrix is not positive non semi definite then it will have n non-negative real-valued eigenvalues but the more interesting property that I want you to look at is this number five is that if your graph the original graph that you started with to construct your laplacian have K connected component right then your eigen vector you will have K eigen vectors which have zero eigen value and this is also easy to show you in the same way as I showed for the the first one right the point is that now you can use this to do clustering right so if your graph had this K connected component which means that these are the K clusters so what you can do is you can get your L do the eigen value decomposition of that and then look at the eigen values there is video and they will correspond to the they will correspond to your disconnected component so let me show that to you in a like an example and then we'll look at a little more let they right so let's say this is my eigen my grass right which I constructed using Network X but we can also construct it using your data using the procedure I showed earlier so this will be the Laplace so this is my degree that D right so all is the diagonal matrix in which all the entries are essentially the sum of the row and then this is my laplacian that's B minus W and you can quickly verify that if I take a sum of this row they come up to 0 but that's easy to see but now let's do the eigenvalue decomposition of this so this is eigenvalue decomposition ER my eigen values is that there will be at least one value here which will be zero this is here the close to zero Stefan because the way they compute these eigenvalues is through some numeric iterative scheme so this is one eigenvalue that is zero which is the property of laplacian and then what you'll see is that there will be one eigenvector here this will be all constant it might not be all one but you know you can always take out the constant part and put it in zero that doesn't matter but you will see at least one so where is that you you oh we can just look at this value right which one is 0 1 2 3 4 than this one and we close to I mean there's a numeric sort of approach so you might not get exact one but if I plot this fourth eigenvector right it's close to all videos all con once to this particular item so this is all the eigenvector eigenvalue so first eigenvalue through 12 one of them will be zero because that's the property of a laplacian and then that will give you - right so that's the first thing that I want to tell you but now let's say if my graph has two disconnected components like this there are two clear disconnected components so now if I do my eigen value analysis on this and look at the eigen values so what that property five tells me is that there will be two eigen values which will be C right that's the property is that if we look at the eigenvectors corresponding to these zero eigenvalues this is what they look like so what they look like is that they're not all one what they have is they are zero for some of the nodes so each entry here on x-axis is a node right so this is the eigenvector some of them are 0 some of them are nonzero so what you will see that they exactly correspond to your partition yes so this one is all sort of the second partition this is the first partition so that's what happened this one is that so what you could do is the reason I'm showing you this is that now if I had given you some data right let's say the data is huge you know thousands of notes and I didn't know how many components they're there but letter somebody told me that there are some disconnected components we don't know how many but we know that there are some right what we can do is we can quickly construct the laplacian which we know how to do do the eigenvalue decomposition and then quickly look at this eigen values right and see how many zero values do I see those will be the number of I component that there are so if this graph had only one component which means everything was connected like there was no isolated partition then you will only get one eigen value which is zero right that's what the first property tells you but if you have a graph in this there are two then you'll get two if there are four you get four and then what you do is you take that eigen value and plot its corresponding eigen vector right and that will give you what point which nodes are in this partition that's pretty neat right so this tells you that okay notes from 0 through 4 are in this one partition and notes from whatever 5 through 9 are in another one right so that's sort of a natural way of doing clustering all right any questions so far yes you you you you you you so W is the adjacency matrix so it will have as many non-jews as many edges that are starting on that note right so for example here is yes for this graph this is my W at the sense matrix so it will have as many one as we thought this is the first node to zero right within zero is connected to four nine six seven eight so it will have one in all of those values all of those locations because wlq Jason scimitar who is connected to whom you you oh I see I see you are talking about when we looked at it from the data like yeah so if you were starting from the data and we were only looking at the first nearest neighbors then you will have only one yeah one edge four point right because everybody will just be connected to their nearest neighbor but here I am showing you a general graph clustering method right and okay so that's a good point so in fact one thing I wanted to point out was that constructing this for this graph you need not use just one nearest neighbor but in fact people use more than one technique a so that they have a nicely connected class so then you will have exactly K once in each row everybody would be connected to their k-nearest neighbors if K is one then you will be connected Levi so yeah you're right but in general for spectral clustering since we do not assume that we are always constructing our graph from data maybe the graph is already given to us and we still want to do clustering so in that case you can have more than one one there you follow they're right I mean what you will see is that in those special circumstances you you might have some additional properties of yours Laplace right because you laplacian as you said will only have two entries that are non zeros and then you can think of other ways but yeah here I want you to think about the general case where you have a general graph and you just want to question alright any other question so now the next question you should ask me or at least be thinking about is that what if there are not truly disconnected components but merely disconnected right because so let's say what if my graph looks like this which clearly there are no clear partitions here so how do I still find these right and it turns out that the same approach would still work here the idea would be let now you will not have three zero eigenvalues right you'll in fact you will only have one real eigenvalue because there's only one component but you will have three eigenvalues with a very small value so you can show from these are results that will come from things like matrix perturbation theory where you can show that there will be at least three eigenvalues that will be very small so they will not be 0 but will be very small and they will allow you to do the clustering so for example in this graph right let's look at this so this is my W construct my D and do my eigen value analysis right and now if I look at my eigenvalues what I will see is so which one is my eigen values first one you will see that there are some eigenvalues are small and these will tell me which I can retire my company of course there will always be one eigenvector which will be all constant right because there's one component but you can look at the other ones to find out the clustering so now what you can do is you can say okay this eigenvector let me look at the values which are high and these are the point that belong to this class same way here here so this will let you do the clustering so in fact what you could do is so instead of doing that right because it's not very easy to set up look at these points right and plays which are high right so what typically people do in spectral clustering is they look at these values like constructor new data matrix in which you have every point and you have the eigenvalue for every point for the top or the lowest eigenvectors so you look at okay these are my small eigenvalues let me take the corresponding eigenvectors and take their values so there are new features for you and then you can do clustering on that they let the K mean right and if you do clustering this is what it gives you these are my points in one component either points another component and then do the last point right so that is the way you use spectral clustering in cases where you do not have clearly split partitions so as I said here right so in practice W might not have K exactly isolated connected components but what you can do is you can look at the small values of your eigen vectors or the small eigen values of the eigen decomposition of L and use that and take their eigenvectors and that gives me like a new data like the letter yourself that there are three very small eigen values I'll take that those three eigen vectors and that will give me a n cross three data set right and then if you do k-means clustering on that that will give you a good at that all right so I might take a few minutes in next class to illustrate this but just quickly let me show that to you so let's say this was my data right so this data has to come to circles here so there are two clusters that are natural first we run came in con this and we clearly see that k-means does not work very well k-means will just try to partition them into two circles essentially right so this is what you get but if you do spectral clustering on this so in spectral clustering what we do is we find first the nearest neighbors converted into our graph right and then do the eigen decomposition of the laplacian and go on with that so if you do that this is what you get right beautiful there does because it was looking at so essentially what happens inside is that for every point right we looked at who are excuse nearest neighbors thing here we use five nearest neighbors so then that way we construct the graph so if you think about it what will happen is that for every node it will only be connected to other nodes which are in this circle right it won't be likely that any node is connected to somewhere far maybe for this one and then when we look at a graph actually that graph will have two beautifully disconnected isolated connected components and then you can do spectral clustering on that and you get the result all right so that's the idea I'll encourage you to just take a look at this page not for the course but just for future where we have compared many clustering algorithms on many different data sets right so I have talked about k-means and I have talked about spectral clustering there are of course many more and what you will see from this simple experiment is that for different types of data these different algorithms have you know they behave differently all right any other questions before we stop okay so I'll see you on Wednesday thank you very much

Info

Channel: ubmlcoursespring2017

Views: 3,502

Rating: undefined out of 5

Keywords: CSE474/574, Spectral Clustering

Id: jgsJZYGeAz4

Channel Id: undefined

Length: 50min 24sec (3024 seconds)

Published: Mon May 08 2017