Geometric deep learning, from Euclid to drug design

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello my name is michael hoot i'm the head of department of computing at imperial college london i would like to warmly welcome you to the inaugural lecture of professor michael bronstein entitled geometric deep learning from euclid to drug design inaugural lectures are events in which we come together as a community to celebrate the achievements of one of our academics but also to showcase in general you know what our academic community is able of producing so today it is my real pleasure to have michael give his inaugural lecture as chair in machine learning and pattern recognition a post he holds since 2018 in our department michael has many many achievements that i do not want to go into here in detail i just want to emphasize that he exemplifies what we really appreciate and imperial you know someone who does very fundamental deep uh research but also uh is capable of transferring these things into applications to solve real problems so next i would like to remind you of some housekeeping you can actually submit questions through the q a channel if you uh do not want your name to be um mentioned in the uh questioning because the meeting is recorded you may submit your question anonymously you can also vote for questions in the q a channel if you like them and um yeah and then we just have a qa session followed right after the lecture itself so uh with that i would just like to hand over to michael to give his inaugural lecture thank you very much thank you very much i hope you can hear me well and thank you michael for the kind introduction and everyone for joining tonight so um i probably need to start with explaining this somewhat enigmatic title of my talk geometric deep learning and i guess you all have heard about deep learning this amazing technology that is transforming the internet industry and possibly also all aspects of our lives so let me decipher the word geometric and for this purpose take you back in history to approximately year 300 bc and since then for nearly 2000 years the word geometry was synonymous with euclidean geometry simply because no other types of geometry existed and this euclid's monopoly came to an end in the nineteenth century when wobachevsky boye gauss riemann and others constructed the first examples of non-euclidean geometries which together with the development of what is called projective geometry uh created an entire zoo of different geometries and towards the end of that century these studies had diverged into completely disparate fields with mathematicians debating which geometry is the true one and what actually defines the geometry so a way out of this pickle was shown by a young german mathematician felix klein appointed in 1872 as a professor in the small bavarian university of atlanta and like myself tonight he was asked to deliver an inaugural lecture which entered the history of mathematics as the erlangen program and klein proposed approaching geometry as the study of environs or symmetries these are the properties that remain unchanged under some class of transformation and this approach immediately created clarity by showing that different geometries could be defined by an appropriate choice of symmetry transformations which were formalized using the the language of group theory a new mathematical discipline also born in the 19th century and the impact of the erlangen program on geometry and mathematics broadly was very profound it also spilled to other fields especially physics where symmetry considerations allow to derive conservation laws from the first principles this is an astonishing result known as neuter's theorem and it took several decades until this fundamental principle through the notion of what is called gauging variance in a generalized form developed by young and mills in 1954 proved successful uh to unify all the fundamental forces of nature maybe with the exception of gravity this is what is called the standard model and it describes all the physics we currently know so i can only repeat the words of noble winning physicist philip anderson that it's only slightly overstating to say that physics is the study of symmetry now at this point you may wonder what does it also have to do with deep learning and i think that the current state of affairs in the field of deep learning reminds a lot of the situation of geography in the 19th century on the one hand in the past decade deep learning has brought true revolution in data science and made possible many tasks previously thought to be beyond reach whether it's computer vision speech recognition or playing intelligent games like go on the other hand we now have a zoo of different neural network architectures for different kinds of data but very few unifying principles and as a consequence it is very difficult to understand the relations between different methods which inevitably leads to the reinvention and rebranding of the same concepts so we need some form of geometric unification in the spirit of the erlangen program which i call geometric deep learning and it serves two purposes first to provide a common mathematical framework to derive the most successful neural network architectures and second to give a constructive procedure to build future yet to be invented architectures in a principled way so the term itself geometric deep learning i made it up for my erc grant in 2015 and it became quite popular after a paper we wrote and it is now used almost synonymously with graph neural networks but i hope to show you today that it's part of a much broader and more interesting picture so if we look at machine learning at least in a simple setting it's essentially a function estimation problem we're given the outputs of some unknown function on a training set let's say labeled dog and cat images and try to find a function from some hypothesis class that fits well the training data and allows to predict the outputs on previously unseen networks what happened over the past decade is that the availability of large high quality data sets such as imagenet coincided with the availability of computational resources graphics hardware called gpus and it allowed to design bridge function classes that have the capacity probably for the first time to interpolate such large data sets and neural networks appear to be a suitable choice to represent functions because even with the simplest choice of architecture like the perceptron that i show here we can produce a dense class of functions with just a two layer network which allows us to approximate any continuous function to any desired accuracy we call this property universal approximation now the setting of this problem in low dimensions is a classical problem in approximation theory it has really been studied to death we have very precise mathematical control of estimation errors but the situation is entirely different in high dimensions and we can quickly see that in order to approximate even a simple class of let's say liquids continuous functions like an example that i show here it's a superposition of gaussian blobs put in the quadrants of a unit square the number of samples grows very fast with the dimension and in fact it is exponential so we get a phenomenon that is colloquially known as the curse of dimensionality and since modern machine learning methods need to operate with data in thousands or even millions of dimensions the curse of dimensionality is always there behind the corner it makes such a native approach to learning impossible and this is perhaps best seen in computer vision problems like this image specification example even tiny images tend to be very high dimensional but intuitively they have a lot of structure that is broken and thrown away when we parse the image into a vector to feed it into the simple perceptron neural network and now if the image is shifted by just one pixel the vectorized input will be very different and the neural network will need to be shown a lot of examples in order to learn that shifted inputs must be classified in the same way the remedy for this problem in computer vision came from classical works in neuroscience by hubel and wiesel the winners of the nobel prize in medicine for the study of the visual cortex where they showed that brain neurons are organized into what is called the local receptive fields and this served as an inspiration to a new class of neural architectures with local shared weights first it was the neocognitron of fukushima and then uh convolutional neural networks the seminal work of yam become were essentially weight sharing across the image solved the problem of the curse of dimensionality let me show now another example what you see here is a molecule of caffeine so this is my favorite molecule i have it here in my cup so this molecule is represented as a graph the nodes here are atoms and edges are chemical bonds and if we were to apply a neural network to this input for example to predict some chemical property like the binding energy to some receptor we can again parse it into a vector but this time you see that any arrangement of the node features logo because in graphs and like images we don't have a preferential way of ordering the nodes and molecules appear to be just one example of data with irregular and non-euclidean structure on which we would like to apply deep learning techniques and social networks are another prominent example these are gigantic graphs with hundreds of millions of nodes we also have interaction networks or what is called interactomes and biological sciences manifolds and measures in computer graphics and so on all these are examples of data which wait to be dealt with in a principle way so let's look again at this multi-dimensional image classification example that at first glance seemed hopeless because of the curse of dimensionality fortunately we have additional structure that comes from the geometry of the input signal and we call this structure geometric prior we'll see that it's a general powerful principle that gives us optimism and hope in dimensionality cursed problems so in our example of image classification the input image is not just a d dimensional vector it's a signal defined on some domain which in this case is a two-dimensional grid and the structure of the domain is captured by the symmetry group it's a group of two-dimensional translations in our example which acts on points on the domain now in the space of signals the group actions on the domain are manifested through what is called the group representation in our case it's simply the shift operator or a d by d matrix that acts on the dimensional vector now this geometric structure of the domain underlying the input signal imposes structure on the class of functions f that we are trying to learn and we can have functions that are unaffected by the action of the group that we call invariant functions and probably a good example is the image classification problem no matter where the cat is located in the image in this example we still want to say it's a cat so this is example of what is called shift environments now on the other hand we can have a case where the function has the same input and output structure for example in image segmentation the output is a pixel-wise label mask so it's also an image we want the output in this case to be transformed in the same way as the input or what we call an equivalent function and again what we see here is shift equivalence now these two principles give us a very general blueprint of geometric deep learning that you can recognize probably in the majority of popular deep neural architectures we can apply first a sequence of equivalent layers such as convolutional layers and cnns and then an invariant global pooling layer that aggregates everything into a single output in some cases we can also create a hierarchy of domains by some coarsening procedure that takes the form of local pruning and this is a very general design that can be applied to different types of geometric structures such as grids homogeneous spaces with global transformation groups graphs sets and many folds where we have global isometric environments and local gauge symmetries and this is what i and my colleagues call the 5g of geometric deployment now the implementation of these principles leads to some of the most popular architectures that exist nowadays in deep learning whether it's convolutional networks that emerge from translational symmetry craft neural networks or transformers that implement permutation in variances we'll see and intrinsic and mesh cnns that are used in graphics and vision that can be derived from gauge symmetry and i hope to show you today as well that these methods are also very practical and allow to address some of the biggest challenges from understanding the biochemistry of proteins and drug discovery to detecting fake news so let me start with crafts and probably each of us has a different mental picture when we hear the word graph but for me maybe because of my work at twitter i first think of a social network that models relations and interactions between users so mathematically the users of a social network are modeled as nodes of the graph and their relations are edges or pairs of nodes that can be ordered in this case we call the graph directed or unordered in this case the graph is undirected the nodes can also have some features that are attached to them that are modeled based d-dimensional vectors say uh the age sex birthplace of social network users that we gave in our example now a key structural characteristic of a graph is that we don't have a canonical way to order the nodes so if we arrange the node feature vectors into matrix we automatically prescribe some arbitrary recording of the nodes and the same calls for the adjacency matrix that represents the structure of the graph if number denotes differently the rows and of the feature matrix and the corresponding rows and columns of the adjacency matrix will be permuted by some permutation matrix p this p is a representation of the permutation group and we have n factorial such elements if we want to implement a function on the graph now that provides a single output of the of the whole graph like in our molecular graph example where we try to predict a single value that it's binding energy let's say uh we need to make sure that its output is unaffected by the ordering of the input nodes so this is what we call uh uh permutation environment function if on the other hand we want to make node wise predictions for example to detect malicious users in a social network we want such a function that changes in the same way as the input with the reordering of the nodes or in other words a permutation equivalent function now a way of constructing a pretty broad class of tractable functions on graphs is using the local neighborhood of a node essentially we look at the nodes that are connected by an h to some node i and if we aggregate their feature vectors together with the feature vector of the node i itself we get some local representation and because we don't have a canonical ordering of the neighbors this must be done in the permutation in variant way so this local aggregation function that i denote by phi must be permutation environment and when we apply this file at every node of the graph and stack the results into a feature matrix we get the permutation equivalent function f now it appears that the way how this local function phi is constructed is crucial and its choice determines the expressive power of the resulting architecture and when phi is injective it can be shown that the neural network designed in this way is equivalent to the vice velar lemon graphisomorphism test it's a classical algorithmic graph theory that tries to determine if the graphs are isomorphic by some kind of iterative label refinement procedure so here is a typical way our local aggregation function looks like we have a permutation environment aggregation of ratios such as sum or maximum a learnable function psi that transforms the neighbor fishes and another function phi that updates the features of node i using the aggregated features of the neighbors and the output of this non-linear function psi that depends on both feature vectors of node i and j can be regarded as a message that is sent from node j to update the features of node i so craft neural networks of this type are called the message passing in chemistry applications they were introduced by justin gilmer from deepmind and in copyright graphics in our paper with yuvang and justin solomon from mit if we look at a typical graph neural network architecture you will immediately recognize an instance of our geometric diplomatic blueprint with the permutation group as our genetic prior we typically have a sequence of permutation equivalent layers sometimes they're referred as propagation or diffusion layers in the literature and an optional global pulling layer to produce a single graphics readout some architectures also include local pooling layers obtained using some form of graph coarsening that can also be learnable now let me say a few words about some interesting special cases of graphql networks that might be surprising so first a graph with no edges is a set and sets are also unordered so in this case the most straightforward approach is to process each element of the set entirely independently by applying a shared function phi to the future vectors and this translates into a permutation equivalent function over the set and as you can see it's a special setting of the graph neural network this architecture is known as deep sets in keeper only now as another extreme example instead of assuming that each element of a set acts on its own we can assume that any two elements can interact and this translates into a complete graph and here we can use an attention-based aggregation which we can interpret as a form of learnable soft adjacency matrix and i hope you recognize the famous transformer architecture that is now very popular in national language processing application it is also a particular case of a graph neural network i should say that transformers are commonly used to analyze sequences of text where the order of nodes is actually given and this node information is typically provided in the form of what is called positional encoding it's an additional feature that uniquely identifies the nodes and similar approaches exist for general graphs and there are several ways we can encode the node positions and we showed one such way in a recent paper with my students georgos barissa and fabrizio frasca where we counted small graphs of structures such as triangles or clicks providing this way a kind of structural encoding that allows the message passing algorithm to adapt to different neighborhoods and this architecture that we call graph subtraction networks can be made strictly more powerful than the device federal element test by appropriate choice of substructures and it's also a way to incorporate problem-specific inductive bias for example in molecular graphs uh cycles are permanent structures because in organic chemistry as you might know we have an abundance of what is called aromatic rings and here again you can see the caffeine molecule that has a six and a five cycle so what we observed in experiments with this architecture is that our ability to predict chemical properties of molecules improves dramatically if we count rings of size 5 or more and pass them as structural encoding so you can see that even in the cases when the graph is not given as input graph neural network still makes sense and even if the graph is given we don't necessarily need to stick to it as a kind of sacrosanct structure in order to do the message passing in fact a lot of recent approaches decouple the computational graph from the input graph either in the form of crop sampling usually to address scalability issues rewiring the graphs or using larger multi-hop filters where aggregation is performed also on the neighbors of the neighbors like in the recent work that we did at twitter which we call sign scalable inception like gnns we can also learn the graph on which to run a graphql network that can be optimized for the downstream task and this is a setting i call latent graph learning and we can make the construction of the graph differentiable and propagate through it and this graph can also be updated between different layers of the neural network this is what we call dynamic graph cnn and it was the first architecture to implement this latent graph learning that we did with collaborators from mit maybe in historical perspective latent graph learning can be related to methods called manifold learning or non-linear dimensionality reduction that were popular when i was a student and they are still widely used today for data visualization the key premise of manifold learning is that even though our data lives in a very high dimensional space it has low intrinsic dimensionality and the metaphor for this situation is this swiss roll surface we can think of our data points as if they were sampled from some manifold the structure of this manifold can be captured by a local graph that we can then embed into a low dimensional space where doing machine learning typically it is clustering is more convenient and the reason why many foreign learning never really worked beyond data visualization is that all these three steps are separate and i hope it's clear that for example the construction of the graph in the first step hugely affects the downstream task so you always have to to fine tune and tweak by hand uh different stages of this algorithm to make it work so with latent graph learning we can now bring a new life to these algorithms and that's why i call it maybe a bit arrogantly manyphonelearning2.oh we now have a way to build an end-to-end pipeline in which we build both the graph and filters operating on this graph as a graph neural network with latent graph structure and we recently used such latent graph learning architecture we call differentiable graph module or dgm it was a collaboration with uh the group of nasir navarre we used these architectures for automated diagnosis applications and show that we can consistently outperform gnns with handcrafted graphs so let me now move to another types of geometric structures we are probably all familiar with and hopefully show you a different perspective so we are talking about grids and breeds are also a particular case of a graph and what is shown here is agreed with periodic boundary condition that is called a ring graph now compared to general graphs the first thing to notice on a grid is that it has a fixed level code structure not only that the order of the neighbors is fixed and i remind you that on general graphs we were forced to use a permutation environment local aggregation function phi because we didn't have a canonical ordering of the neighbors on the grid we do for example we can always put in sequence the green nodes first and the red and then the blue and if we choose a linear function with the sum aggregation operation we get what is called convolution and uh if we write it as a matrix it has a very special structure it's called the circular matrix and the circular matrix is formed by shifted copies of a single vector of parameters that i denote here by teta and here you go this is exactly the shared weights concept in convolutional neural networks now circumventions are very special matrices they commute and uh not only that they commute uh in particular with a special circular matrix that cyclically shifts the elements of a vector by one positioning we call it the shift operator so circuit matrices commute with shift and this is uh just another way of saying that convolution is a shift equivalent operation now this statement works in both directions not only every circuit matrix commutes with shift but also every matrix that commutes with shift is circular what we get is that convolution is the only linear operation that is shift equivalent and i hope you can see here the power of our geometric approach basically convolution automatically emerges from translational symmetry i don't know how about you but when i studied signal processing nobody explained me where conversion comes from it was usually given as a formula just somewhere out of the group so let me now move to a more general case where our group formulas will be more prominent and we can think of convolution as a kind of pattern matching operation in an image this is done by a sliding window so let me write it a bit more formally we need to define the shift operator t that will shift the filter which i denote here website and an inner product that matches the filter to the image x if we do it for every shift we get the convolution now the special thing here is that the translation group can actually be identified with the domain itself each element of the group a shift can be represented as a point on the domain it's a point to which we should right this is not the general case and in general we'll have the filter transformed by the representation of our group raw and the convolution will now have values for every element of the group g and this is this can be very different from the euclidean case so here's an example of how to do conversion on the sphere and uh it's not some exotic construction spherical signals are actually very important for example in astrophysics where a lot of observational data is naturally represented on the sphere like the cosmic microwave background radiation from some primordial universe that they show here so our group in this case is the special orthogonal group so3 these are rotations that preserve orientations and its action on points on the sphere can be represented as an orthogonal matrix r that has a determinant equal to one so the convolution here is defined on s of three we get the value of inner products for every rotation are of the filter it means that if we want to apply another layer of such convolution we need to apply it on the so3 group not on the sphere anymore which is a three-dimensional manifold of on which points are rotations i denote here them by q now the sphere in this example uh is a non-euclidean space it's a manifold but it is quite special every point on the sphere can be transformed into another point by an element of the symmetry group of reputations so in a sense there is complete democracy among points in geometry we call such spaces homogeneous and their key feature is a global symmetry structure this global symmetry structure doesn't obviously hold for general manifolds and one thing to note when we apply a sliding window on an image is that it doesn't matter which way we go from one point to another will always arrive at the same result the situation is dramatically different on the midi fold if i go along the green path here or along the blue one i will arrive at the different results in differential geometry we call it parallel transport and the result of moving a vector on the manifold is path dependent now a crucial difference between manifolds and euclidean spaces is that many folds are only locally euclidean we can map a small neighborhood of a point u to what is called the tongue in space and the tangent spaces can also be equipped with additional inner product structure that is called the romanian metric which allows us to measure length and angles if the manifold is deformed without affecting the metric we say it's an isometric deformation and isometries not surprisingly also form a group now we can define an analogy of convolutional manifolds using a local filter that is applied in the tongue in space and if you make this construction intrinsic or expressed entirely in terms of the metric we get deformation environments or environments with respect to the isometric group of the manifold this was in fact the very first architecture for deep learning on manifolds that we call geodesic cnns and um one important thing that i didn't say is that because we are forced to work locally on the manifold we don't have a global system of coordinates we need to fix a local frame at each point or what physicists call a gauge the gauge can be changed arbitrarily or chosen arbitrarily at every point by applying a gauge transformation so it's usually a location or a local orientation preserving rotation and uh we need to account for the effect of the gauge transformation on the filter uh by making it transformed in the same way so the filter is gauge equivalent this was the work of taco coin from the group of maxwelling a few years ago so you can see here again the comeback of our geometric diplomatic blueprint uh either in the form of environments to the isometry group or maybe in a more subtle way as equivalence to what is called the structure group of the tangent bundle of the manifold the reason why we care about many folds is that in computer vision and graphics two-dimensional manifolds or surfaces that are typically discretized as meshes are a standard way of modeling 3d objects what we gained from our geometric perspective is filters that can be defined intrinsically on the object and this equips our deep learning architecture with environs under inelastic deformations and one application where dealing with deformable objects is crucial is motion capture of mock-up that is used in the production of expensive blockbuster movies such as avatar and what you see here is a cheap marketless motion capture setup from a swiss company called phase shift so the company was bought by apple in 2015 and its technology now powers the animoji on iphone 10 and later versions and what this video nicely shows i think is two prototypical problems in computer vision the problem of shape analysis where we are given the noisy phase scan of the actor captured by the sensor that has to be brought into correspondence with some canonical phase model and the problem of synthesis where we need to deform this model to reproduce the input expression of the actual 10 years ago one would need a 3d sensor to produce this motion capture effect and i myself was very adamant about it and since there are no cheap real-time sensors with sufficient resolution on the market at that time to do it we had to build one and this was our startup and vision and here in this video i show this eureka moment from 2011 where an fpga implementation of our sensor prototype uh worked for the first time so envision was acquired by intel in 2012 where i spent the following eight years building what is now called the intel real sense technology and uh real sense was released in 2014 with this funny commercial featuring sheldon cooper from the big bang theater and apologize that i always forget his real name so it was the first mass manufactured integrated 3d sensor that became a commercial success for intel now fast forward 10 years and now we don't need 3d input anymore for something similar to that motion capture video i showed we can actually have a hybrid geometric deep learning architecture for 3d shape synthesis problems with a standard 2d cnn encoder that works on an input image or video and the geometric decoder that reconstructs a 3d shape like the 3d shape of a hand in this example and this was the work of my phd student dominique colon together with stefano zaferyo and last year at cvpr the main computer vision conference he showed this demo of full body 3d avatars with detailed hands from purely to the video input and it ran on an old iphone 10 times faster than real time this was a collaboration with the startup real ai that was acquired by snap last year now i'm allowed to to say it and i cannot take much credit here besides uh convincing one of the founders asked of cooking us to leave his highly paid job in the industry to do this startup and also being one of the early stage investors so let me now talk about some applications of geometric deep learning which is probably the part that i'm most excited about this field as michael mentioned i think this is a very practical uh field of research so if we look at graphs they are really ubiquitous we can use graphs to describe practically any system of relations or interactions from nanoscales as models of molecules as we've seen to microscales looking at interactions between different molecules all the way to the macro scale at which we can model social metrics of entire countries or even the whole world as a graph now one thing that you often hear especially in the popular press in relation to social networks is this problem of misinformation or the so-called fake news and we see uh quite some empirical evidence that fake news spread differently on the social network so using graph learning we try to detect misinformation by looking at the spreading patterns of different stories or pieces of news on twitter we got quite encouraging results and together with my students i founded a company called fabio ai that commercialized this technology and in 2019 fabulous was bought by twitter where i currently have a group that does research on graph ml and as you can imagine graphs of different types such as the follow graph the engagement graph or other graphs that we do not expose publicly are among the key data assets for twitter and they are useful in a lot of scenarios from detecting malicious users to recommend the systems but if you ask me to pick just one application where i think geometric deep learning is likely to produce the biggest impact i think it is biological sciences and drug design you may know that making new drugs is a very long and extremely expensive business bringing a new drug to the market takes more than a decade and costs more than a billion dollars and one of the reasons is the cost of testing where many drugs fail at different stages if you look at the space of possible drug-like molecules that can chemically synthesize it is extremely large on the other hand what we can test maybe just a few thousand compounds in the lab or the clinic so there is a huge gap that has to be breached and it can be done by computational methods that perform some form of virtual screening of the candidate molecules predicting properties such as toxicity and target binding affinity and graph neural networks have recently achieved remarkable success in virtual screening of drugs and nowadays they are already more accurate and orders of magnitude faster than traditional approaches that were used before and last year the group of jim collins at mit used craft neural networks to predict antibiotic activity of different molecules leading to the discovery of new powerful antibiotic compound called calicine that originated as actually a candidate anti-diabetic drug and if the the coronavirus pandemic has taught us anything is how vulnerable the human kind is to new pathogens so nowadays we already have bacteria that have antibiotic resistance so uh one of such bugs becoming contagious is not a question of if it's a question of when so we need new antibiotics and i think uh it is interesting that i'm i'm myself i'm a professor at imperial college where actually the first antibiotics were invented by sir alexander fleming so if i look at these traditional small molecule drugs what is called one thing that characterizes them is that drugs are typically designed to attach or as chemists say bind to some pocket-like regions on the surface of a target which is usually a protein molecule and here you can see again my favorite molecule of caffeine and the interface of the adenosine receptor in the brain that it binds to so it is cut out in this figure you can clearly see this deep pocket structure on the protein surface more recently the pharma industry is interested in drugs that disrupt or inhibit what is called protein protein interactions or ppis because most biological processes in our body including those that are related to diseases involve proteins that interact with each other and one of the most famous such mechanisms is the program death protein complex it is used in cancer immunotherapy for which the nobel prize in medicine was awarded in 2018. now since ppis typically have flat interfaces like the like this program that ligand pdl1 protein i show here they are usually considered undruggable by traditional small molecules and a promising new class of drugs is based on large biomolecules peptides proteins or antibodies that are engineered to address these difficult targets such drugs are called biologics or biological drugs and there are several of them on the market and uh for some for example types of cancer for which if you got it 10 years ago that would be your probably your death sentence nowadays there is a cure and with my collaborations from my peer fellow in lausanne we developed a geometric deep learning architecture called massifta that was on the cover of nature methods last year that allows to design from scratch what's called de novo new protein binders and you can see here three such examples that were already experimentally confirmed to bind the pdl one oncological target now another promising direction uh towards cheaper and faster therapies is what is called drug repositioning when existing approved and safe drugs are used against new targets sometimes in combinations with other drugs this is called combinatorial therapy or polypharmacy and many such drug combinations may have unknown or potentially dangerous side effects and graph neural networks were recently applied to predict these side effects notably in the work of marine kazeetnik who is now a professor at albert now the combinations can be uh problematic can be antagonistic they can also be synergistic and i'm involved in a big collaboration for to find drug combinations against the viral infections such as coppit19 and these ideas are actually not limited to synthetic drug molecules with my colleague kirill biselkov from the faculty of medicine and the vodafone foundation we took craft-based drug recognition approaches to the domain of food you may know that plant-based food ingredients are rich in compounds that belong to the same chemical classes as anti-cancer drugs and every bite of food that we put in our mouth comes with thousands of bioactive molecules most of them still remain largely unexplored by experts they are not tracked by regulators and i would say unknown to the public at large how many of you have heard for example about polyphenols flavonoids or terpenoids so it's truly the dark matter of nutrition and the way we model the effect of drugs or any molecules in fact is by how they interact with protein targets and since proteins interact with each other the effect of the drug on one target ripples through the ppi graph and the facts are the proteins kind of network domino effect because in our body biochemistry a lot of the biomolecules are interrelated if we now take a training set for a couple of drugs with no anti-cancer effect we can train a classifier based on a graphene network that predicts how likely a molecule is to be similar to an anti-cancer drug from the way it interacts with protein targets and we can then apply this classifier to other molecules coming from food for which we know the interactions with proteins and this gives us a list of potential anti-cancer food molecules now i'm obviously hugely oversimplifying here and the biggest part of this work that was published in nature scientific reports was actually to study the pathways affected by these molecules and to confirm their anti-cancer effect and lack of toxicity but to make long story short we constructed the anti-cancer molecular profiles of over 250 different food ingredients and we see that there are prominent champions that we call hyper foods for double tea citruses cabbage celery sage these are all rather common cheap and i would say boring ingredients that we better add to our everyday diet and uh maybe the coolest part of this project is that the ingredients we identified were used by the famous chef bruno barbieri to present short recipes for christmas and if you don't know he's the italian version of gordon ramsay that directs the italian master show on tv and if you wonder why he's in bed here this is part of the vodafone foundation citizen science campaign called dream lab we collaborated with them to use the idle power of smartphones at night to make our computations so i think it's a good moment to end on this tasty note and we started with this somewhat irreverent desire to imitate the erlangen program in machine learning trying to derive different diplomatic architectures from fundamental principles of symmetry and this took us all the way from image classification to molecular gastronomy and all the approaches we've seen today were instances of a common blueprint of what we call geometric deep learning where the architectures emerged from the assumptions on the domain underlying our data and the symmetry group geometric deployment methods have exploded in the past few years especially craft neural networks and there are already several success stories in industrial applications and i think it's indicative that last year two major biological journals featured geometric deep learning papers on their cover which means that it has already become mainstream and possibly will see new exciting results in fundamental sciences and the big challenges that modern science and engineering faces last but not least let me acknowledge all my amazing collaborators and students and thank you very much for your attention thank you very much michael this was a fascinating lecture you know starting with the history of geometry uh you know having a show stop at air london you know with the airline program and then moving into deep learning and how you know one could really look at this through this wane of a much deeper introspection using symmetries and so on and then you know showcasing how this all gets translated to solve real-world problems so thank you so much for this um we are now coming to the question and answer part of the inaugural lecture uh i would like to start off with uh one question that was really the most popular one in in the qa channel so the question is what do you think is a field uh be it a scientific field or some application domain that can be heavily improved by graph learning right well so this is what i mentioned towards the end of the talk i think if i were to choose one i think there are many but probably one of them is uh biochemistry and biological science uh sciences more broadly so i think what we've seen is that actually predicting properties of molecules is very successful using graph neural networks uh when combined with other methods such as active learning or reinforcement learning basically now we have a mechanism to search the space of molecules very efficiently predicting some of the properties that are important for drugs so i uh hope or expect to see uh really some breakthroughs in this domain in the next few years of course the problem is much more complicated because biology is very difficult to model and uh the data uh getting high quality data is extremely important thanks could i maybe just follow on on this i mean so biologists in particular immunologists you know they they might actually sometimes be skeptical when you're presented with new methodologies and that's perhaps also a an issue of them actually understanding you know how these things are produced and and what they actually mean for them do you think there's a you know a way forward to get great explainability with these methods for these kinds of communities definitely so this is this is actually a very good point uh actually graph neural networks or at least a certain type of graphical networks that are let's say our model used to model molecules they actually incorporate physically meaningful inductive biases so what works the best currently for molecules what is called equivalent message passing neural networks so i explained the message passing algorithm in rhonographs uh what happens in molecules that your structures have extra geometric structure right so the node features are not just vectors so they they must be transformed in a certain way so like a physical interpretation so this message passing is done in such a way that it preserves these these geometric structures so it's it and therefore it is much more data efficient you can uh explain it physically and i think there is no reasons why uh biologists or chemists should not use these methods in fact i think they're using them thank you um so the next question is is more about the symmetry of of the entire approach um i mean we know that uh graph isomorphism is you know probably a hard problem right in general uh and so you know if you look at these group invariances you know is this exact group invariance model of geometric deep learning uh you know really realistic and are there may be results about approximate in variances yeah so uh definitely this is not realistic and usually this is a convenient model is always uh for which we can derive the theory but what is important is uh to address actually a broader class of uh transformations that can be related to or you can measure the distance between this more general transformation and and an element of the group so you need some extra structure you need to define distances between transformations and you can show some kind of uh perturbation stability results so for example if it's not a a pure translation but some geometric warping of your domain that at least locally look like a translation then you will still get approximate in variance or equivalence and in fact it was uh shown for example for convolutional neural networks through what is called the scattering transform by joanne broner my my colleague uh already almost a decade ago we have similar results for graphs so you can show that if you perturb the graph a little bit then your filters suddenly will not produce completely different results it exists for many folds that exist for other structures as well thank you next question is is about pooling so cnns and other deep learning architectures use pooling uh what is the corresponding prior in geometric deep learning in in the blueprint yeah so i mentioned it uh very quickly but this is what is typically called in signal processing scale separation this is actually another geometric fundamental principle that is used everywhere for example the fast multiple algorithms that are used to quickly approximate computations in multi-particle systems are based on this principle and the idea is that if you have some way to course in your domain to create a hierarchy of domain basically by agglomerating nearby points so it requires some extra structure metric then you can approximate functions what is called locally stable functions by uh basically by projecting the your your data to a coarser level and then applying your function at that level and uh this is exploited in the form of for example uh max pooling and cnns this is exploit exploited on graphs so this is pretty ubiquitous principle as well thank you the next question is uh around adversarial machine learning you know so this is of course a known uh uh topic in machine learning and it's really asking you know in this geometric deep learning is uh you know the different kinds of uh adversarial models here i mean in particular i think the symmetries that are inherent in this might actually be an attack surface themselves right so maybe you can uh uh share some some thoughts on that yeah so graphs uh adversarial attacks or graphs uh on graphs are actually more interesting in my opinion than adversarial text images because they have two types of structures right so you have the features which are continuous and they have the connectivity which is discrete and typically the attacks that are described in the literature is you have some target node that you want to do something with it let's say on social network you have some attacker nodes and you have some budget for example of what kind of connectivity you can change and which features you can change so uh bottom line uh you can you can show even some certificates you can guarantee that uh that under certain assumptions uh the the performance of graph neolithic will not be compromised so this is a line of works uh from the group of stefan guniman that actually pioneered uh some of the the first models for adversarial attacks on graphs this is very interesting because these models can be taken also to biological domains where we can think of diseases maybe as adversarial perturbations of the graph and maybe we can think of theories adversarial attacks on this that's very interesting um that's very interesting thanks um maybe just uh uh another question is around um sort of decentralized machine learning you know so i mean this is something that's also becoming more popular where there's more edge computing and you know things uh lots of data processing is pushed into edge devices uh so maybe you can share uh your your views on you know whether the approaches you've shown us today are amenable to to this kind of paradigm right so probably there are several several aspects here so most of the the at least the message passing type architectures they are local right so you can think of uh basically the computation that that happens uh it can be performed at each node of the graph uh separately so you can essentially map your computation to uh of each node to an independent computational unit so in this sense craft neural networks are extremely well paralyzable i should say that the for example gpu architectures are not probably the most suitable model both in terms of hardware and software architecture for dealing with graphs and in fact to the best of my knowledge well i have several such collaborations uh um chip manufacturers are very keen to see what will be the next generation of architectures that are more suitable to work with with graph structured data not only craft learning but including other things like computing pagerank and uh one such company for example i think it's actually a british unicorn called graphcore that has very interesting architecture that hopefully works better than gpus for this type of structures yeah thank you it will be very interesting to see how this unfolds you know on the chip side in the next five to ten years yeah okay i think we're kind of uh coming to the conclusion of the q a session so it's now my pleasure and privilege to uh introduce you to professor daniel rickard who will uh give the vote of thanks daniel thank you very much uh to uh michael huth and also of course to uh michael brownstein uh and it's actually quite difficult to start this vote of thanks because um i don't quite know where i should really uh start because obviously you have heard uh what a brilliant um scientist michael is i think yes he has given you sort of a tour de force uh of of an entire field of machine learning in which he has really been instrumental in uh in establishing that and and also pursuing some very interesting uh topics uh and applications so i think this is a super exciting now of course i could actually go about and tell you about all the acolytes uh michael has has received and um i can i can list at least a few uh he has been elected an ieee fellow is a royal royal society wilson merit award holder he has a silver medal from the royal academy of engineering he has um five i believe really this is not a mistake five erc grants um which i i wasn't really sure whether this is really possible to do but uh it's really correct and every time i looked uh somewhere else i found some more prizes and awards he has one so i'm not going to try to list all of them because i i don't want to bore you uh too much with that but i think the list is very very long and i think the the other thing which you have all seen uh and i'm sure you will agree with me that is that michael is not also not only a brilliant uh a scientist but also a very accomplished uh entrepreneur and innovator and i think that's extremely exciting to see um you have seen that he has uh founded a fabula ai which was acquired by twitter now if you think that's already a great success story uh then of course i can also tell you that yes uh he has been involved in several other very successful uh start-ups including envision which was acquired by intel and and produces uh now the the real sense uh camera system and you've probably also seen that he is already cooking up some new uh pubs ideas for startups along the idea of hyperfoods so i think i'd be very interested to to not only see this also being a success but also tasting some of the the products which come out of this um now if you really think about all of this then you think perhaps uh if you can do all of these things perhaps you have to be cloned to really be a for one person to do this um and to have all this time and energy now i actually know for a fact that um michael is actually cloned because he has a twin an identical twin brother but um my explanation is sort of a bit of a loss because his twin brother alex is also a really accomplished uh scientist entrepreneur and innovator on on his own with a long list of his own achievements so i'm afraid that explanation doesn't really work uh either um i think probably some of you will also know that michael is very much a a a globetrotter he was born in in i think in tula russia studied in israel where he also got his phd from and he has moved all around the world where he has held appointments at a number of different universities around the world including visiting appointments in stanford at mit harvard tel aviv the institute of advanced studies uh at atum uh in munich which i think is a great place where it's been rudolf diesel fellow uh for two years um and many other uh and many other places so again the list is far too long for me to all uh elaborate on this now i've been sort of trying to find a word which best describes uh all of uh michael's achievements or and this characteristic and i could only really come up with a german expression um which uh is called a thousand zazza i'm i'm not sure whether you can guess what this what this means but um because unfortunately there's no simple english translation for this word actually i looked at up in a dictionary which also apparently uses deep learning to make translations but none of the english translations really work particularly well some of them where he's an all-rounder of course of course that's that's that's true uh a jack of all trades is perhaps not quite the right uh a whiz kid i think that comes closer to to this um but i like the one the the one i like best uh was uh is the translation of a he's a hell of a chap um and i think that's a very nice uh a way of of of approximately describing this so um i think you get the message i'm trying to convey here in this in this world of things um and i wanted to conclude by thanking michael for a really inspiring and fascinating lecture and it's really a privilege and a pleasure to have you as a colleague and as a professor at imperial college london thank you very much michael thank you daniel this is really flashing and well thank you thank you very much daniel for this very fitting world of thanks yeah so thanks very much for this vote of thanks uh and uh also um you know thanks to michael for giving such a brilliant lecture i can only echo what daniel had said you know it's just fantastic to have michael as a as a colleague and professor in our department uh i also would like to thank the audience you know for having uh uh joined the event and for having us really stimulating questions and also the events team for having run this in such a smooth fashion and so this concludes the event and i wish you all a nice evening thank you very much you

Info

Channel: Imperial College London

Views: 12,627

Rating: 4.9729118 out of 5

Keywords:

Id: 8IwJtFNXr1U

Channel Id: undefined

Length: 60min 8sec (3608 seconds)

Published: Wed Mar 17 2021