Hello, welcome to the AI Coffee Break!
Today, Ms. Coffee Bean has prepared the definitive introduction to Graph
Neural Networks! Or GNNs in short. But Ms. Coffee Bean, are you not a little
over-confident to say "definitive"? Phsssht! I just wonder if… psshht! Ok, ehm, here we go:
I always struggled with Graph Neural Networks in machine learning classes. But one day, I could
not get around them anymore and had to learn them! They are such a huge topic in machine learning for
a very obvious reason: graphs are everywhere! And not just here artistically represented in this
stock image. Let's take for example air travel. Flights from origin to destination
with perhaps a stop in the middle, can be represented as a graph, where airlines try
to maximize the number of passengers transported by trying to minimize fuel, time and other costs.
Navigation systems on smaller scales, on street maps, also do graph optimization
for getting you to your destination. Other examples of graphs are personal connections
or graphs that social media apps live of. Or even text is also a graph if we think of it not
as a sequence, but if we analyze it linguistically in terms of the dependency parse, for example.
Images are graphs too in multiple senses: We make one kind of graph interpretation of images when
we represent each pixel as a graph node and each pixel is connected with an edge to its neighbors.
This is a very regular kind of graph, a grid. Another, more semantic graph interpretation of
an image, can be in terms of the scene graph, where objects and entities are nodes,
connected by their relation in the image. One more example and we get finally to Graph
Neural Networks: In NLP people are interested in the knowledge graphs comprising facts
about the world, like these true facts here. Well, I hope you are now convinced
now how ubiquitous graphs are, so of course we need a deep neural model to
handle these structures, right? Well, graph neural networks are the deep and neural solution
to this. They come in many shapes and flavors, but we want here to stay simple and explain
only the Graph Convolutional Neural Networks. If you understand this one, you will also
understand the basics of the others architectures, because they work based on the same principles.
And Graph Convolutional Neural Nets are not hard, you just have to apply this formula iteratively
and you are done. Wait what? Well if that’s it, let’s break it down to every detail:
First, let’s talk about the nodes. They are denoted here with h, where h is a vector
representation of the nodes. These vectors are chosen depending on the application: They can be
word vectors when working with knowledge bases, they can be pixel values when working with images.
They can be a combination of image features and word vectors if working with scene graphs!
In Graph Neural Networks, the idea is that we have initial vector representations for our
problem. But we can do even better, if we take the neighbors and relations into account. So
in principle, we update our initial vectors using information in the graph structure and
get vectors that represent the problem better. So the key is not in the initial representation
of the nodes, but in how one aggregates the representations of the neighbors in order to get
an even better representation than the original. This is why we apply this formula recursively, to
get another, better vector h at each time step. Only the question now, is how do we
exactly get this better representation? First, when computing the new representation,
we want to keep something from the old one. For example when working with a knowledge base that
truthfully says that Ms. Coffee Bean is a star, when computing a new vector for her, now knowing
that she is a star, we still want to keep the knowledge of her identity. This is why, we
take the old representation and multiply it by a weight matrix, deciding how this previous
knowledge should be transformed. But how do we know this weight matrix? Well, we don’t! We let
a neural network decide for it! We train it on our given data of node labels, edge labels or
other predictions from the graph structure. When training with these objectives, the neural
network learns to make right predictions given the new representations, so it better should
learn the weight matrix that helps to get this new representation. This is how we decide on what
and how to keep from our original representation. Now, we also want to aggregate the
information from the neighbors. So we take their representation and also let a
neural network decide how to transform it and what to keep from it for being able to solve the
given task, for example node label prediction. And here is where the magic of the graph
convolutional neural network happens: In the aggregation function. By this sum we say that we sum over all
transformed neighbor representations. By c_ij we mean that we normalize the
vectors differently for each neighbor. But this is not really important, because crucial
here, is that the normalizing sum here is a permutation-invariant aggregation function.
A whaat now? Well, this means that however we aggregate the information from the neighbors,
the way we do this has to be insensitive to order. Because there is no intrinsic ordering to a
graph, how would we ever decide if we take node 4 or node 2 first? So we don’t decide, because we
take a permutation-invariant function, delivering the same result given any order. Another
example for a permutation-invariant function would be the product or the maximum function,
because they do not depend on argument order. And because we aggregate the information from
the nodes, similarly to how we do in CNNs (convolutional neural networks), this is then
called the convolutional graph neural network. Of course, the name "convolution" is misleading,
because the convolution in images is not permutation-invariant (you can check it), but
still, it is a valid operation if we regard images as graphs, because images are a special
type of graphs, where the order is given. So yes, the name convolution is misleading, but it is how
it is! And here we are: If we apply this formula iteratively, at each timestep we learn a better
graph representation comprising node information also from neighbors and their neighbors.
These new informed and refined representations should be then suitable to be used for the
new task of, for example link prediction, node classification or even entire graph classification
and many other tasks. GNNs have been successfully applied in the medical domain for example, to
predict side-effects due to drug interactions. Or in numerous graph optimization problems,
like the travelling salesman problem. Or for various applications in knowledge
graph prediction. They can be even used for predicting object trajectories from video. These
applications are just a few from a very big list. Where there is a graph (so everywhere), there can
be a graph neural network solving the problem. So this was it, we hope that now you have
a better understanding of Graph Neural Networks and are curious enough to go out
and explore the topic of GNNs by yourself! Hey, do not forget to like and subscribe!
See you!