CS224W: Machine Learning with Graphs | 2021 | Lecture 2.1 - Traditional Feature-based Methods: Node

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome to the class today we are going to talk about traditional methods for machine learning in graphs and in particular what we are going to investigate is different levels of tasks that we can have in the graph in particular we can think about the node level prediction tasks we can think about the link level or edge level prediction tasks that consider pairs of nodes and tries to predict whether the pair is connected or not and we can think about graph level prediction where we want to make a prediction for an entire graph for example for an entire molecule or for for an entire piece of code the traditional machine learning pipeline is all about designing proper features and here we are going to think of two types of features we are going to assume that nodes already have some types of attributes associated with them so this would mean for example if this is a protein protein interaction network proteins have different um chemical structure here different chemical properties and we can think of this as attributes attached to the nodes of the network at the same time what we also want to do is we want to be able to create additional features that will describe how this particular node is positioned in the rest of the network and what is its local network structure and these additional features that describe the topology of the network of the graph will allow us to make more accurate predictions so this means that we will always be thinking about two types of uh features structural features as well as features describing the attributes and properties of the nodes so the goal in uh in what we want to do today is especially focus on structural features that will describe the structure of a link in the broader surrounding of the network that will describe the structure of the network neighborhood around the given node of interest as well as features that are going to describe the structure of the entire graph so that we can then feed these features into machine learning models to make predictions traditionally in traditional machine learning pipelines we have two steps in the first step we are going to take our data points nodes links entire graphs um represent them um with vectors of features and then on top of that we are going then to train a classical machine learning uh classifier or model for example a random forest perhaps a support vector machine a feed forward neural network something of that sort so that then in the future we are able to apply this model where a new node link or graph appears we can obtain its features and make a prediction so this is the setting uh generally in which we are going to operate today so in this lecture we are going to focus as i said on feature design where we are going to use effective features uh over the graphs which will be the key to obtain good predictive performance because we want to capture the structure the relational structure of the network and traditional machine learning pipelines use hand-designed handcrafted features and today's lecture will be all about these handcrafted features and we are going to split the lecture into three parts first we are going to talk about features that describe individual nodes and we can use them for node level prediction then we are going to move and talk about features that can describe a pair of nodes and you can think of this as features for link level prediction and then we are also going to talk about features and approaches that describe topology structure of entire graphs so that different graphs can be compared and classified and for simplicity we are going to today focus on undirected graphs so the goal will be how do we make predictions for a set of objects of interest where the design choice will be that our feature vector will be a d dimensional vector objects that we are interested in will be nodes edges sets of nodes uh milling entire graphs and the objective function we'll be thinking about is what are the labels we are trying to predict so the way we can think of this is that we are given a graph as a set of vertices as a set of edges and we want to learn a function that for example for every node will give will give us a real valued prediction um which for example would be useful if we are trying to predict age of every node in our social network and the question is how can we learn this function f that is going to make these predictions so first we are going to talk about node level tasks and features that describe individual nodes the way we are thinking of this is that we are thinking of this in what is known as semi-supervised case where we are given a network we are given a couple of nodes that are labeled with different colors for example in this case and the goal is to predict the colors of uncolored nodes and if you look at this example here so given the red and green nodes we want to color uh the gray nodes and you know if you stare a bit at this the rule here is that uh green nodes should have at least two edges adjacent to them while red nodes have at least one edge uh have exactly one edge um connected to them so if we are now able to describe um the node degree of every node um as a as a structural feature uh in this graph then we will be able to learn the model that in this simple case correctly colors the nodes of the graph so we need features that will describe this particular topological pattern so the goal is to characterize the structure of the network around the given node as well as in some sense the position the location of the node in the broader network context and we are going to talk about four different uh approaches that allow us to do this first we can always use the degree of the node as a characterization of um structure of the network around the node then we can think about the importance the position of the node through the notion of node centrality measures and we are going to review a few then we will talk about um characterizing the local network structure um not only how many uh edges a given node has but also what is the structure of the node around it first we are going to talk about clustering coefficient and then we are going to generalize this to the concept known graphlets so first let's talk about the node degree we have already introduced it there is nothing special but it is a very useful feature and many times it is quite important where basically we will say the we capture the the structure of the node v in the network with the number of edges that this node has um and you know the drawback of this is that it treats all the neighbors equally but in and in this sense for example nodes with the same degree are indistinguishable even though if they may be in different parts of the network so for example you know the node c and node e have the same degree so our classifier won't be able to distinguish them or perhaps node a h e f and g also have all degree one so they will all have the same feature values so the our simple machine learning model would that would only use the node degree as a feature would we only we would be able to predict the same value or would be forced to predict the same value for all these different nodes because they have the same degree so um to generalize be a bit this very simple notion we can then start thinking about um you know no degree only counts neighbors of the node and without capturing let's say their importance or who they really are so the node centrality measures try to capture characterize this notion of how important is the node in the graph and there are many different ways how we can capture or model this notion of importance i'm quickly going to introduce eigenvector centrality um which we are going to further uh work on and extend to the seminal pagerank algorithm later in the in the course i'm going to talk about betweenness centrality that will tell us how uh how important connector a given node is as well as closeness centrality that will try to capture how close to the center of the network a given node is um and of course there are many other um measures of uh centrality or importance so first let's define what is an eigenvector centrality we say that node v is as important if it is surrounded by important neighboring nodes u so the idea then is that we say that the importance of a given node v is simply normalized divided by 1 over lambda sum over the importances of its neighbors in the network so the idea is the more important my friends are the higher my own importance is and if you if you look at this and you write it down you can write this in terms of a simple matrix equation where basically lambda is this positive constant like a normalizing factor c is the vector of our centrality measures a is now a graph adjacency matrix and c is again that vector of centrality measures and if you write this in this type of form then you see that this is a simple um eigenvector eigenvalue equation so what this means is that the solution to this problem here to this equation here um is um um this is solved by the uh uh by the given eigenvalue and the associated eigenvector and what people take as a uh no a measure of node centrality is is the eigenvector that is associated with the largest um eigenvalue so in this case if eigen largest eigenvalue is lambda max um by perron-forbinious theorem uh because we are thinking of the graph as undirected it is always positive uh and unique then uh the associated leading eigenvector c max is usually used as a centrality score uh for the nodes in the network and again um the intuition is that the importance of a node is the sum the normalized sun of the importances of the nodes that it links to so it is not about how many connections you have but it is uh how um who these connections point to and how important are those people so this is the notion of node centrality captured by the eigenvector centrality a different type of centrality that has a very different intuition and captures a different aspect of the node's position in the network is is what is called betweenness centrality and betweenness centrality says that node is important if it lies on many shortest paths between other pairs of nodes so the idea is if a node is an important connector an important bridge an important kind of transit hub then it has a high importance the way we compute between a centrality is to say between a centrality of a given node v is a summation over pairs of nodes uh s and t and we count how many shortest paths between s and t uh pass through the node v and we normalize that by the total number of shortest paths of the same length between s and t so essentially the more shortest paths a given node appears on the more important it is so it means that this kind of measures how good a connector or how good of a transit point a given node is so if we look at this example that that we have here for example between the centrality of these uh nodes that are uh on the uh uh on the edge of the network like a b and e is zero but the the between the centrality of node c equals to three because the shortest paths from a to b pass through c a shortest path from a to d passes through c and a shortest path between uh a and e again passes through c so these are the three shortest paths that pass through the note c so it's between a centrality equals to three and by a similar argument the between the centrality of the node d will be the same equals to three here are the corresponding shortest paths between different pairs of nodes that actually pass through this node d now that we talked about how important transit hub a given node is captured by the betweenness centrality the third type of centrality that again captures a different aspect of the position of the node is called closeness centrality and this notion of centrality importance says that a node is important if it has small shortest path path lengths to add all other nodes in the network so essentially the more central you are the shorter the path to everyone else is the more important you are the way we operationalize this is to say that the closeness centrality of a given node v is one over the sum of the shortest path lengths between the node of interest v and any other node u in the network so uh to give an example right the idea here is that the the smaller this um the the more in the center you are the smaller the summation will be so one over a small number will be a big number and if somebody is on the let's say very far on the edge of the network and needs a lot of uh uh long paths to reach other nodes in the network then it's between a centrality will be low because the sum of the shortest path lengths will be high in this case for example you know the the closeness centrality of node a equals 1 8 because um it has to node b it has the shortest path of length 2 to node c it has a shortest path of length 1 to d it is 2 and 2e is 3. so it's 1 8. while for example the node d that is a bit more in the center of the network has a length 2 shortest path to node a and length 1 shortest paths to all other nodes in the network so this is one over five so it's between a centrality oh sorry it's cl node centrality closest centrality uh is higher um and then now i'm going to shift gears a bit i was talking about centralities in terms of how how important what is the position of the node in the network now we are going to start talking we are going back to think about the node degree and the local structure uh around the node and when i say local structure it really means that for a given node we only look uh in its immediate vicinity uh and decide uh on the uh pro on the and characterize the properties of the network around it and a classical measure of this is called clustering coefficient and clustering coefficients measures how connected one's neighbors are so how connected the friends of node v are and the way we define this is to say clustering coefficient of node v is the number of edges among the neighboring nodes of v divided by the degree of v choose 2. so this is the number this is um k choose 2 measures how many pairs can you select out of k different objects so this is saying how many how many note pairs there exist in your neighborhood so how many potential edges are there in the net in in your neighborhood um and this says how many edges actually occur so this metric is naturally between zero and one where zero would mean that none of your friends not on none of your connections know each other and one would mean that all your friends are also friends with each other so here's an example imagine this simple graph on five nodes and we have our uh red node v to be the node of interest so for example in this case node v has clustering coefficient um of one uh because all of its uh four friends are also uh connected uh with each other so here um the clustering is one in this particular case for example the clustering is uh 0.5 the reason being that out of um six possible connections between the four neighboring nodes of node v there are only uh three of them are connected while here in the last example the clustering is zero because um out of uh all four uh neighbors of uh node v none of them are connected with each other the observation that is interesting that then leads us to generalize uh this notion to the notion of clustering coefficient to the notion of graphlets the observation is that clustering coefficient basically counts the number of triangles in the ego network of the node so let me explain what i mean by that first ego network of a given node is simply a network that is induced by the node node itself and its neighbors so it's basically degree one neighborhood network around the given node and then what i mean by counts triangles i mean that now if i have this ego network of a node i can count how many triples of nodes are connected and in this particular use case where the clustering coefficient of this nose is 0.5 it means that i find three triangles in the network neighborhood in the ego network of my node of interest so this means that clustering coefficient is really counting these triangles which in social networks are very important because in social networks a friend of my friend is also a friend with me so social networks naturally evolve by triangle closing where basically the intuition is if somebody has two friends in common then more or less sooner or later these two friends will be introduced by this node v and there will be a link uh forming here so social networks tend to have a lot of triangles in them and clustering coefficient is a very important metric so now with this the question is could we generalize this notion of triangle counting uh to more interesting structures and and count uh the number of pre-specified graphs in the neighborhood of a given node and this is exactly what the concept of graphlet captures so the last way to characterize the structure of the of the network around the given node will be through this concept of graphlets that rather than just to count triangles also counts other types of structures around the node so let me define that so graphlet is a rooted connected non-isomorphic sub graph so what do i mean by this for example here are all um possible graphlets um that that have um a different number of nodes we start with a graph let on two nodes so it's basically nodes connected by an edge there are three possible graphlets on three nodes and there is a and then here are the graphlets of four nodes and these are the graphlets on five nodes so now let me explain what we are looking at so for example if you look at the graphlets on three nodes it is either a chain on three nodes or it's a triangle right it's all three nodes are connected these are all possible connected graphs on three nodes now why do i say there are uh three graphlets not two uh the reason for that is that the position of the node of interest also matters so here for example you can be at this position and then the question is in how many graphs like this do you participate in or you can be at this other uh position here position number two and it's basically saying how many pairs of friends do you have and then this this in the case of a triangle all these positions are isomorphic they are equivalent so there is only one of them so this is the position in which you may you can be now similarly if you look at now at uh four node graphlets there is uh many more of them right um they they look the following right you again have a chain on four nodes and you have two positions on the chain you're either in the edge or you are one one away from the edge from if you go from the other end it is just symmetric here in the second this kind of star graph like you can either be the satellite on the edge of the star or you can be the center of the star in a in a square all the positions are isomorphic so you can be just part of the square here's another interesting um example where it has three different positions you can be you can be here you can be here or you can be at position 10 which is isomorphic to the to the other side in this kind of square with that diagonal again you have uh two different positions and in this last fully connected graph on four nodes uh all nodes are equivalent so there is all positions are equivalent so there is only one so what this shows is that um if you say how many graphlets are there on five nodes uh there is 73 of them right labeled from 1 all the way to 73 because it's different graphs as well as positions in these graphs so now that we know what the graphs are we can define what is called graph led degree vector which is basically a graph let base based features for nodes and graph led degree counts the number of times a given graphlet appears rooted at that given node so the way you can think of this is degree counts the number of edges that the node touches clustering coefficient counts the number of triangles that a node touches or participates in and graph the degree vector counts the number of graphlets that that a node participates in so to give you an example um a graphic degree vector is then simply a count vector of graph let's rooted at that given node so to uh to give an example consider this particular graph here and we are interested in the node v then here is a set of graphlets on two and three nodes this is our universe of graphlets we are only going to look at the graphlets all the way up to size of three nodes and not bigger um and then uh you know what are the what are now the graphlet instances for example the the graphlet of type a this node v participates in two of them right it one is here and the other one is there right so this means this one and that one then the graphlet of type b node v participates in one of them right this is um this is the graphlet here right b and then you say how about how many graphs of type d does uh node um sorry of type c does node v participate in and it doesn't participate in any because uh here it is but these two nodes are also connected so because graphlets are induced this edge appears here as well so for d we get zero sorry for c we get zero how about for d d is a path of on two nodes if you look at it there are two instances of d uh here is one and uh here is the other as illustrated here so graphlet degree vector for node v would be 2 1 0 2 so 2 one zero two and this now characterizes the local neighborhood structure uh around the given node of interest based on the frequencies of these graphlets that the node v participates in so um if we consider graphlets from two to five nodes we can now describe every node in the network with a vector that has 73 dimensions or 7 3 73 coordinates and this is essentially a signature of a node that describes the topology of node's neighborhood and it captures its inter interconnections um all the way up to the distance of four four hops right because um uh a chain of four edges has five nodes so if you are at the edge of the chain which means you count how many paths of length four uh lead out of that node so it characterizes the network up to the distance of four and a graph like degree vector provides a measure of node's local network topology and comparing vectors now of two nodes pro provides a more detailed measure of local topological similarity than for example just looking at node degree or clustering coefficient so this gives me a very fine-grained way to compare the neighborhoods uh the structure of neighborhoods of two different nodes in perhaps two different networks so to conclude uh so far we have introduced different ways to obtain node features and they can be categorized based on importance features like node degree and different centrality measures as well as structure based features where again no degree is like the simplest one then that counts edges clustering counts triangles and the graphlet degree vector is a generalization that counts other types of structures that a given node of interest participates in so to summarize the importance based features capture the importance of a node in a graph we talked about degree centrality and we talked about different notions of centrality closeness betweenness as well as the eigenvector centrality and these types of features are using in predicting for example how important or influential are nodes in the graph so for example identifying celebrity users in social networks would be one such example the other type of node level features that we talked about were structure-based structure based features that capture the topological properties of local neighborhood around the node we talked about node degree clustering coefficient and graph led degree vector and these types of features are very useful uh for predicting a particular node's role uh in the network so for example if you think about predicting protein function then um this type of graphlet features are very useful because they characterize the structure of the network around a given node

Info

Channel: stanfordonline

Views: 31,354

Rating: undefined out of 5

Keywords: Jure Leskovec, Machine Learning, Machine Learning with Graphs, Graphs, CS224W, Stanford, Traditional Feature-based Methods, Node-level features, Node level prediction, link level prediction

Id: 3IS7UhNMQ3U

Channel Id: undefined

Length: 27min 30sec (1650 seconds)

Published: Thu Apr 15 2021