CODE: GRAPH Link Prediction w/ DGL on Pytorch and PyG Code Example | GraphML | GNN

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Community today we're gonna talk about link prediction for graph machine learning and I would like to show you the idea of Link protection implemented in pi torch geometric and deep graph Library so here we go link prediction using graph neural networks now in many industrial applications such as social recommendation item recommender Knowledge Graph completion something can be formulated as a link prediction and what it does it predicts whether an edge exists between two different nodes in your graph so let's have a look at this there are two Publications you have to know the first is link prediction based on graph neural networks from 2018 and the other one is modeling relational data with gravecon with graph convolutional networks from October 2017. what they tell you is there is a scheme maybe you will find it sometimes it's called seal seal stands for learning from subgraphs embeddings attributes for link prediction seal now you know what seal means the idea is rather simple you know this idea suppose we are given a network graph a graph with the number with a set of vertices and a set of edges and a set of sampled positive training links p and a set of sampled negative training links e n and for negative P for positive now the trick is simply to temporarily add some negative links some negative edges into the set of all the other edges so we increase certainly the set of our edges in the graph including some negative training links that do not exist but remember when we did some word to back we had the negative sampling and more or less we do exactly what we do in natural language processing we do now here for graph neural networks and we generate some new node embeddings for the graph G Prime and we have here the set of vertices but now we have the set of edges combined with this with the set of edges negative training links so this is all there is to know about seal now the implementation is important because you're not going to believe it we are talking about graph Auto encoder and in the publication I just showed you there is an encoder architecture and a decode architecture for link projection for the particular task of Link production so we know this we have seen variational autoencoder all over the place I have a lot of videos about this but what are they doing here particularly in graph machine learning now easy the encoder Maps each entity of our vertices to a real valued Vector you notice this is our node embedding our feature tensor creation this is what we do if we learn a GNN the decoder on the other side reconstructs now the edges of a graph relying only on the vertex representation of under node embedding representation remember graph ml is representational learning we have a graph structure and we have to find a representation we have to learn a new representation of the node and of the edges so we can apply our algorithms for graph machine learning so what it does it takes a triple of subject relation and object and it applies a function so we end up with a single value for each Edge on is there a link or is there not a link this can be something like a an activation function but let's have a look at the theory what is going on here here on the left side we have our encoder and on the right side we have our decoder it is not so complicated remember I told you in my last video when we were talking about graph Sage the theory behind graph sage and how we construct graph neural network layers that we had here the process of message passing now of course if you have an encoder in your stack GNN layers your specific GNN layers can be either a graph convolutional layer or a graph Sage or a get whatever you prefer now here I would like to show you two versions and gcn in the pi torch geometric version and graph Sage we will do show you the code we do the example in in the Deep graph learning implementation so we have here one layer we have two layer you remember we had two hops then away from our neighborhood and what we end up with as I showed you in my last video we end up with an improved note representation that takes into account the one-hub neighborhood the environmental representation the information that is stored in the nodes and the message passing that's going on from one node to the other nodes the message generation the message aggregation I showed you three different topics in my last video how you can do this here is important what we get out of the encoder is a improved node representation taking into consideration the structural information of the graph and of course the information stored in the feature tensors of each node or even if you have an edge feature tensor the information of the edge feature tensor beautiful so for each node we have an improved node representation by our GNN layers applied this is our encoder site now for the decoder you remember our input is of course the improved node representation of all nodes of the graph and what we have to do now is to find out if between the nodes there are edges or not so how we do this now you know this is a matrix this is a tensor this is a vector so what we do always if we go with graph machine learning we do a tensor dot over a matrix multiplication and you will not believe it we do a DOT products of the note representation so this means here of a vector of our Matrix of our attenza the dot product of note representation for each Edge so we have a start node we have an endnote and those two nodes are connected by one Edge so our final classifier applies to dot product between the source node embedding and the destination node embedding to derive an edge level prediction and we do this for one note for one Edge and for all the different edges this is the task of our decoder and then we say beautiful this is a auto encoder decoder structure and then we have to train it so let's have a look at this we are talking about coding so showed you here the example of Pyrex geometric how we do this now easily if we do it now in a graph convolutional Network you know it is easy this is the classical form how we construct the gnm we have a convolutional layer one as you see here in the Declaration this is the graph convolutional layer in already provided by pi G for us then we have a raylo and activation function and then we have the second graph convolution layer con version two here and this more or less constitutes our Gene and model beautiful now for the D this was the encode now for the decoder I told you we have a matrix multiplication a vector multiplication tensor multiplication and this is exactly what is going on here and then you do it for all the different edges and you have defined here a class called net or network whatever you like so and what you do well you have simply a model that you have to train on so we say our model now is our net with a specific data set with specific dimensions yes we can do this of course on a GPU or maybe even on a TPU we have an ad our classical order atom Optimizer and we go with a specific loss function and our loss function is here BC with legit slots beautiful so you know this is the classical way to implement it and you say hey great so now I have it in in how to watch geometric and also in DGL yes the complete code for link prediction here on the left side you have the pi G implementation and on the right side you have the dtl implementation the difference is here on the left side we use the graph convolutional Network for our GNN layers and here on the right side we will use the graph Sage the message the improved mechanism over gcn and I will show you this now in detail because what we are interested in I do have a specific video explaining graph sage in detail in the mathematical content and now building up on the last video we will use now we will code now but now in DG also you see code if you read the code here and you read the code here you will find out you can do it in any uh language if you want but we will do it in djl just to have a little bit of an alteration because all my last four or five videos were on Pi G so now comes the code hello community so today we're gonna do a code with gonna code link prediction with the Deep graph library and our basic languages of course Pi torch so what we have I do have I hope I have a GPU up and running beautiful and then yes we just say install and then we install on deep graph Library So Satisfied we have done it beautiful and our topic as I just showed you is of course link prediction using graph neural networks now you know that here we're behind it so it is easy we just import numpy sci-fi we import Pi torch and we import our deep graph Library so this is the first that we import so this is an official notebook from Deep graph Library so here you have the theory again to show you in this tutorial you treat the edges in the graph as the positive edges then we sample a number of non-existing edges we call them negative example or negative edges or negative graph so a negative a non-existing edge is of course some note pairs a start node and an endnote with no Edge between them well what a surprise then we divide the positive examples and the negative examples and we create a training set and we create a test set and we evaluate the model with any binary classification metric and our metric is simply the area under the Curve now you know about a seal methodology we do not apply the original seal methodology with the node embedding where we have a sign specific values we will do an advanced version so first of course we have to load our data set our graph so and with DGL it is so easy we have just dhl.data.core graph dataset this G stands for our graph and we have the features that you are familiar with we have 2 700 nodes we have 10 500 edges like in the last video the dimensionality of our node feature tensor is 1433 and the task last time was with the classification of the node embeddings uh assign it to seven different classes we will not use our classes now because we are now focusing on some link prediction so we will focus on the edges and here we start and this is the very first topic we have to prepare our data sets and given that we have a huge data set we have now to split in a training data set in a test data set and sometimes you also want to do some evaluation so here we go in this tutorial they pick up 10 of the edges from the positive examples in front of the edges that really exist in the test data set and leave the rest on for the training data set beautiful so we have 10 in our specific test data set and 90 goes to the training data set great now you know this how you split up this do you have a lot of videos already on this channel about DGL let's just do this what you end up with is Stan gun with detail remove edges works by creating a subgraph resulting in a copy could be slow for computing large graphs you can improve the pre-processing but we just do it with the classical command and we are here with our Tiana and model and as I told you with dtl we want to apply the graph Sage model I have in one of my last video a detailed description of the graph Sage idea the mathematical implementation and the coding I'll leave you the link here in the description so we have DGL we import the sage layer beautiful and we create our model with two layers two GNN layers and those layers are specific graph Sage layers so beautiful we say here conversion one is graph sh1 and convolution 2 is H2 we have inside your own beautiful message passing algorithm that is beyond the typical graph convolutional layer and we Define very simply defeat forward we have layer 1 we have done the activation function the non-linear activation function with the learning happening if a simple arraylo functionality and then we have our stack and Sage layer beautiful as you know the mathematical formula so now notation yeah as I told you they call this here the positive graph where all the edges that are really there exist and the negative graph where we just add some non-existent edges remember where to back we also had this negative sampling now we construct some negative edges in our graph and we train our model on this we know it this is a representational Auto encoder that we're gonna build so let's do this detail recommends that you treat the pair of nodes as another graph since you can describe a pair of notes with an edge yes you can describe a pair of nodes with an edge yes okay you have the positive graph consisting of all the positive example as edges so these are the real world edges and the negative graph consisting of all the negative examples that do not exist on reality but we use here only for computational purposes yes yes yes you knew how to do this let's just execute the code and then as I showed you now in the decoder we have to have to take the dot product the dot product is simply a matrix multiplication and in dtl we have this beautiful command I showed you already in one of my last video on dtl coding that you have here simply the dot product of two uh we have a source node feature and a destination node feature and this is simply done with this beautiful and short command then you get a vector so you need to squeeze it a little bit yes so just just execute don't forget to execute it yes absolutely yeah if you want another uh decoder structure you don't have to go with a simple dot product you can go you build something yourself whatever you like there is an example that they give to you you can build another decoder but we stay with the classical way we have a DOT product you're familiar with DOT products let's not make it complicated more complicated than it is so the training Loop yes beautiful we are in graph machine learning and now we come to the ml part so our model our model is as we defined our graph Sage model with two graphs H G9 layers so and then we say we go with the decoder with our DOT uh product dot predictor beautiful and then what we have to do we have to compute of course the loss function and you know that we go here with a binary cross entropy loss function with logits this is defined within DGL so we just use this and if you want to do the validation so as I showed you we compute for the validation here the area under the curve area and the Curve and this is done in the way that you are familiar with beautiful and now we set up the loss and the optimizer and since we're doing now machine learning on graphs you're not going to believe it our Optimizer here it also is an atom Optimizer please fill in your learning rate or other hyper parameters that you like the warm-up period or whatever this is your classical atom Optimizer and here we go for the training so what do we have we have our model beautiful we have the positive score and the negative more or less non-existing core and we have our last function beautiful and then we go with our Optimizer we go with our loss so and we're gonna print out for each Epoch yes the last beautiful and then we can check the results we do here not the classical evaluation but as I showed you we go here with the area under the curve for our indication of the evaluation accuracy and let's just do this how many steps do we have oh yeah Rich yeah let's say 200. 200 Epoch should be fine we have a GPU oh yeah that's fast that's great so you see our Alaska is done goes down goes down dot zero zero beautiful and our accuracy is at 84 now for such a simple model you can of course improve it and you can add a lot of tuning to these functionalities but 84 in this very simple model is really okay so and this is it I showed you how to do link prediction you have to be careful when you do the splitting for the training data set the test data set and for the evaluation but this is a topic I will show you in my next video as already indicated but before we switch to the code let me tell you my next video I will tell you something critical and this critical is about splitting your graph data for your training just a hint there is some problem if you do if you cut graphs in a not optimal way and then it's not depending if you go with dtl or if you go with with pi G yes and you have with 5G of course graph gym this is something very special developed by Stanford University of Stanford but this is just the outlook for the next video that I show you there is some very specific way how to split out your graph data set for your training data set for your evaluation data set and for your test data set but this is just the teaser for next time now we go on and now we train our code and just to remember there's a two Publications that I showed you have a look at the original publication this is always the best source of information
Info
Channel: code_your_own_AI
Views: 4,090
Rating: undefined out of 5
Keywords: Graph ML, Graph Machine learning, Link prediction, recommender systems, PyG, DGL, Deep Graph Library
Id: wxJ84sMJfUA
Channel Id: undefined
Length: 21min 16sec (1276 seconds)
Published: Sat Dec 17 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.