DeepMind's AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don't)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it will change everything deep mind solves 50 year-old grand challenge the game has changed deep mind's latest ai breakthrough achieves historic new milestone helps solve how diseases invade cells improve protein folding prediction ai breakthrough it also wipes your butt automatically it is the newest deep mind big publication actually it's not a publication yet but so what happened and i'm sure you've heard this um is that every year there is this competition of protein folding prediction so proteins are these structures that fold in a given way and we'll go into that in a bit but basically every year there is this competition and the results of this year's competition came out and they looked something like this namely every entry here you see is a team participating in that competition of protein folding prediction and there is one team which is deepmind's system alpha fall 2 which completely dominates all the others to the point where the problem is now considered to be solved now solved in in this case uh simply means that you're past a certain number in this in this test set and if you're past that certain number your predictions are useful enough so that other scientists can basically take them and base work on them so that's what it means for this protein folding problem to be solved now we don't have much information on alpha fold 2 yet other than it's really good and like a blog post and a bunch of uh advertisement videos by deepmind they are writing a paper on it but today i want to go into this blog post um maybe parse out what we can gather from that blog post and i also want to go actually through the alpha fold one paper so as you can see uh the performance here increased drastically with alpha fall 2 but you know guesses are high that the system is going to be somewhat similar to alpha fold 1 of which we do have a paper so today we'll go into alpha fold 1 we'll go into some speculations of alpha fall 2. i can already give you my speculation its transformers its attention uh that all of a sudden made this big jump together with probably a few other improvements to the alpha fold one system basically transformers continuing to uh dominate the entire field so where do we start it's probably best by the way if this is not a great meme template i don't know what is just saying just saying um yeah so let's actually start with the problem itself uh i i realize if you're here you're probably a machine learning person might not know too much about protein folding so these things here are computer representations of proteins they don't really look that way but sort of similar um a protein essentially is a chain of amino acids so an amino acid uh where do we have this right here amino acids are these what they're called basic building blocks of life since the proteins um proteins are what make the cell do things so protein are sort of the workers in the cell they are used as signaling molecules receptors uh they are parts of your muscles actually the parts that move are proteins so they they are all the work doers whenever something needs to work in a cell do do mechanical or work proteins are involved and amino acids are the building blocks of proteins so each amino acid has an has a given a certain common structure and there are 21 of them so all the proteins in the world are simply made out of chains of these 21 amino acids and these chains they are formed in so there's always this sort of body that can link up to other bodies of amino acids it's very similar if you maybe know how dna is structured is a very similar concept except in dna there are four different bases here there are 21 amino acids and each amino acid is a little bit different in each amino acid has like a tail that hangs off so the tail can be you know look like this or it can look like this like a as with a side chain are there is there one where it's like maybe a cyclic one i'm not sure maybe it can look out here or it can have sort of no tail at all i think that's the case for glycine so the important part is depending on these on this tail the properties the chemical properties of the amino acids are different and then what happens next is really interesting once this amino acid chain is built in a in this um so this is the the central dogma of modern biology is that you have dna and dna is translated to um rna sorry and then it's translated to so it's red off copied to rna which is sort of a dna clone and then the rna is translated into the amino acid chain and there's always three three pieces of dna mapped to one amino acid this is very it's like a compiler notably the interesting part is that these steps right here this compilation steps are done by proteins so there are proteins that do these things so nature in a very real sense is its own compiler so this here you can see as like the binary and this here is like the source code but what happens once you build this chain of amino acid and you set it out into the cell because of these different properties of these side chains they're also called residues this chain begins to fold and so this is if you know a bit of chemistry you might know that these are these are sort of atoms that are linked with covalent bonds in this case and it can be that part of this chain is rather like electrically negatively charged and here part of this chain might be like electrically positively charged in a given place over a given other place and it also depends on the surrounding medium of course and that means that in this case for example these two things will attract and so if you release this amino acid chain what you're going to get is sort of a bend where now the the chain sort of bends and these two this chain right here this tail goes like here this tail goes like here i'm sorry if there is no if there is no um if there's no i don't even know what to call it pyrene rings or something like this if there isn't an amino acid with that i apologize but the point is that these two things attract and sort of form this shape and this shape is very important we know that proteins and proteins consist of it can be hundreds thousands tens of thousands of these amino acids in a chain the protein's function is interestingly largely determined by its structure by its 3d structure not necessarily by the actual amino acid so technically you can substitute amino acids for each other so this amino acid here can be could be substituted for another amino acid that maybe isn't the same but is has the same properties of its side chain such that if the structure is still the same the protein would perform the same function so that that is is very special property of proteins namely their 3d structure largely determines their function so for example in this step here when you read off the rna to the dna as you know the rna is sorry the dna is like this double strand of connected base pairs and in order to replicate the dna or to read it off there is a thermal or let's call it there's also this step of dna replication right where you copy the dna in mitosis in order to do that you need to split off the two strands you need to split it up because you want to get like a protein needs to get here to actually read it off for that there is a protein a specific protein that will insert right here to split up the dna which is called a helicase and that really is very important how that protein is shaped so the shape needs to be actually such that it kind of removes these bonds from each other so the shape is very very important for a protein and conceivably you could build a helicase from many many different amino acid sequences as long as it has the same shape now i think something like something like fundamental like a helicase is probably conserved in the evolutionary tree but i hope you get the point the shape is super duper important now the shape uh isn't just arbitrary there are so the amino acid chain is called the primary structure and then the first thing that happens is that two very distinct kind of sub shapes appear so often repeating shapes these things i think are called alpha helix helices or helix this is a helix and this here is i don't know what's in english it's probably called a strand or something like this these are like long sheets like they i think they're called beta strands and these things form these are often repeated sequences and then the third the tertiary structure is when the whole thing starts to kind of fold on itself and so on and give itself the um the final structure so this is part i guess the rna polymerase which is the molecule that reads dna and outputs rna and there are many many many proteins now since the shape is so important um it is vital that we know of it right and technically technically this is what why this problem is 50 years old i guess they say it's a 50 year old problem i think that's due to the fact that 50 years ago a noble laureate said the following since a protein is you know fully determined by its amino acid chain and since the you know amino acid chain determines the structure that it's going to go because of these uh kind of chemical properties it should be possible to read in the amino acid sequence or reading the dna sequence we know what amino acid sequence results and output the shape of a protein however this is an extremely complicated problem it turned out to be um because they're very subtle interactions they're not always the same it depends right like somewhere out here there could be some amino acid with like some weird chain that you know everything folds on itself all the time so um at some point these get in contact and the changes kind of the local properties here so this is a very very difficult problem to solve and um people have have sort of tried to do this and now apparently deepmind the first system that does this to such a satisfaction that it's beneficial all right now i lost my train of thought um yeah so the shape prediction what happened so far is what you'd have to do is you'd have to sort of do this determine this experimentally so you'd have to take these proteins and um crystallize them and then like shoot x-rays at them and then infer the structure you can you can do that from crystallized proteins because i think it's due to crystals are like very regular uh accumulations of proteins so if you look at a snowflake um that is you if we knew nothing about the water molecule that it's like uh h2o if we knew nothing of that we could just look at a snowflake and determine this structure this this these specific angles here from the snowflake um we would just look at the snowflakes and if someone tells us look that's all the same material that's all water uh we could infer what the water molecule looks like just by analyzing snowflakes because they're crystals stuff and the pretty much the same here is you build you make crystals out of these materials you shoot x-rays at them and then you sort of reason over the patterns that come out this is very very difficult very expensive and so to solve this problem computationally is super important i will get to this graphic in a minute this is sort of the only thing we know about alpha fold 2 is this graphic right now because they have not yet released the the paper or any descriptions of the model as i said but what we'll do is we'll go into alpha fold one so this is alpha fold one and alpha fold one was participating in the same competition two years ago and was already dominant there but not yet dominant to the point of having quote quote-unquote solved the problem just better than other systems so this is the basic structure of alpha fold one um so what do you what do you have right here let's let's give us ourselves an overview so the overview is the following there are two different stages to this algorithm stage one is over here and stage two is over here um maybe it's easiest to start with stage two so the output of stage one is this thing right here a distance and torsion distribution prediction so this this this matrix here that's kind of tilted on its side i believe there are more down here right okay so what you do right here is you you take an amino acid sequence and you line it up right here you line it up this is the amino acid sequence it's a bit harder if there's like a split but let's just say a protein is um actually there can't be a split sorry that's in the amino acids i'm dumb so a protein is a single chain of um these amino acids uh there can be multiple sort of parts to a bigger protein conglomerate but there is this chain you line it up here and here so now we're building sort of a pairwise matrix between the sequence and itself okay and this pairwise matrix is going to be a distance matrix so what we are going to do is we are going to input some features about this sequence of amino acids right that's what we get as an input and we're going to predict for any pair right so here we have the the sequence and we're going to predict for any pair how far are they apart so of course here the answer is always kind of zero they're zero apart but you might say you know these two are five apart and these two here are seven apart but these two here are only one apart so it's reasonable you know that the final structure the these two are close together we don't worry about close together right now we just worry about for each two we'll predict how far they are apart okay so this is you can view this as you know a machine learning problem right you have an input c you have a sequence and you simply want to predict the distance matrix so here you can see that in fact you can see the top and bottom one is the predicted and one is the uh real i don't even remember which one's which you can see that this system does a pretty good job at that there are minute differences if you really go look like down here you can see a bit of a difference uh over here there is a bit of a difference but in general this system does a pretty good job so this is the output of stage one is this matrix it's a bunch of other it's like also the torsion angles and so on but the main thing is you predict the distances between those two that's what you take as a input to stage two so what stage two does is stage two builds a model of this molecule and the model is sort of a differentiable geometrical model so they say they where is it this i don't get these nature papers like they're split into two parts but then they're they largely say the same things i am absolutely confused by them so we're gonna jump around a fair bit um they say we parameterize protein structures by the backbone torsion angles of all residues and build a differentiable model of protein geometry to compute the coordinates for all residues and thus the inter-residue distances so what they do is essentially they build a computer model of these amino acids and these are parameterized by the torsion angles now the torsion angle is simply the angle between any two of them so this would be like a torsion angle of 180 degrees and then if it folds like this it would be a torsion angle of 90 degrees and so on and you need two torsion angles because you're in 3d but essentially the torsion angles determine the structure of the protein so it's one way of parameterizing it so they built a differentiable model a differentiable model of protein geometry okay now the important thing is they don't do any learning with this differentiable model the purpose of this differentiable model is such that um what you can do now if you have a differentiable model you can run gradient descent so imagine they pretty much lay it out right here so they have the x x is um x is the output of your differentiable geometry right of your torsion angles let's just call it this greek letter phi psi whatever um [Music] if x is the output and now x goes into your loss function so x goes into your loss function and the loss function simply compares x to the predicted x okay so the loss function will take in x and it will compare it to the x that you predicted from from this thing here okay so we start off with a flat chain maybe actually i think we start off with some initialization because they also predict the torsion angles directly uh right here they're predicted towards an angle direction and that's what we initialize from but let's just say we initialize from the flat chain and then because this is differentiable we do so your your l your l is x minus x prime okay and what we do is we derive the loss with respect to the angle uh to the torsion angle okay so what and we can do this since this is differentiable so now we know how do we need to change the angle which is this thing right here uh in order to make the loss smaller right and maybe it says you nee actually you need to turn it down right make the angle smaller and we do that okay cool now it's only 90 degrees and then we do it again and again and again and you can see that by changing all the angles such that this loss is smaller we end up through steps step step step we we in our computer model we sort of replicate this process that happens in nature where what we feed in is how far any two amino acids should be apart and by running gradient descent just gradient descent on the torsion angles we figure out what do the angles need to be in order to make this happen okay so first we predict all the distances and then we figure out how do we need to set the angles such that these distances are fulfilled these are not true distances these are predicted distances right so everything depends on how well we can predict these distances but once we have them we can sort of replicate in our computers the process as it happens in nature except in nature uh the the whole folding is dependent on these all these chemical interactions and so on and now we do none of this we simply let's see how do we need to fold in order to make these distances in our computer model like these like the distance between this and this and this and this any two distances may agree with the distances that we have predicted right here and you can see uh that over time uh this as you run gradient descent this goes up this this tm score goes up the root mean square distance goes down between then you of course can compare it if you have a test set with stuff that people have already figured out you can analyze these metrics and see that indeed you do get the correct folding it's also pretty interesting that so here in uh blue and red i believe you have yeah exactly so the the helix in blue and the strands in red so in this case you from if you have this folded structure or partially folded structure you can already see that these sort of substructures emerge like this is a helix right as you can see and then you sort of make this may be a strand and so on there are ways to heuristically classify that and you can see that if you look at the database right you can see that this here is a strand these are helixes and this is a strand and these are heli this is a strand and so on and you can see that the model here is what the model thinks at the beginning it doesn't get many things correct though it does some but then over time it sort of refines its guesses until at the end it's pretty much you know equal to what the to what the uh database to what the true sample is and here is simply the uh distribution of i guess confidence about these things and the the torsion angles right here so it as you can see this two-step process is um the key here to do that now alpha fall 2 conceivably probably changes this a little bit but again we're not sure the step one right here is a deep learning system so step two is simply a gradient descent procedure that you run at inference time right this at training you you can you can just do step one so step one is is the machine learning bit so the goal is to output this distance this distance tensor right here and there are more things than distances as we said there are torsion angles and so on but ultimately you want to output this distance matrix and how do they do it you can already see it's a deep neural network so you want to build a input data point let's say of l by l which is sequence length by sequence length so you want to collect some features you don't know the distances yet right but you can collect some features that are either either pairwise features between these two things right so here um maybe this is i don't know leucine and this is what's a different amino acid glycine and um in here you want to put features maybe it can be features for that position right maybe leucine here is at the 100th position in the in this particular protein and this is at the 90th position so you want to put in some features that of that that you can derive from a data set you can put in correlation statistics in general between these two amino acids you can even put in uh just single features so you have these tiled l by one uh features which is um just features for the sequence itself not pairwise features but what you do is you simply replicate them along along any given dimension right here you always put the same feature this is very common uh in convnets and you can even do a scalar feature so there are some scalar features and what you would do is you would simply fill an entire plane with that scalar feature all the same number it's just easier to do it like this because it fits into the convolutional architecture well so you want to provide all kinds of features and the features they provide are you know plentiful and a lot of them do introduce some domain tools domain expertise and so on but once they have that they simply take that sort of image with many many channels and they predict this image if you want so it's just an image to image uh translation problem and they do this via a convolutional neural network as you can see there are 220 residual convolutional blocks now i assume that most of the viewers of this video are familiar what convolutional neural networks are if not i'm deeply sorry but i will not go into that but you can see they sort of um they tile this tensor right here and they tile it differently from um from from instance to instance so they tile it they in the training procedure they always tile it differently that's a form of data augmentation but ultimately you slide over this image with this 64 by 64 convnet and you produce the image on the right here you can see an inherent weakness of these approaches namely that this thing can only ever look at 64 amino acids at a time so now that can that can be the same if you're on the diagonal of this let's say let's say this is not 64 by 64 but three by three right if you're on the diagonal you would only consider three amino acids and their interactions with each other right any to any interactions with each other if you're off the diagonal what you would consider is maybe these three amino acids and these three amino acids and you would only consider you consider features for maybe for those three but interactions only in between like the these not interactions actually within the same amino acids so your the thing that you can look at any point in time is going to be very limited right and these so these distances that you get out here they necessarily cannot directly depend on let's say this amino acid right here you always have this limited view of your protein that's sort of local now people argue that that's actually enough if you look at maybe the green connections right here in order to establish them uh what's most important is the vicinity of these uh of this amino acid and the immediate vicinity of this amino acid and of course the interaction between those two vicinities but it is quite conceivable that this green thing down here being so close will actually sort of push the two apart and um sort of do this interaction which in my understanding would not be covered by a system like this and that's where alpha fall 2 i believe is is one point where it makes the big gains that it does now the features that go in here as i said they are they're quite plentiful um one of the more interesting features is this msa this multiple sequence alignment and i believe they're they're up right here um yeah sequences so sorry here they introduce them in recent years the accuracy of structure predictions has improved through the use of evolutionary covariation data that are found in sets of related sequences sequences that are similar to the target sequence are found by searching large datasets of protein sequences derived from dna sequencing and aligned to the target sequence to generate a multiple sequence alignment correlated changes in the positions of two amino acid residues across the sequences of msa can be used to infer which residues might be in contact so what with this i've searched out one of the papers right here and this is uh from a paper called improve contact prediction proteins using pseudo-likelihoods to infer pots models the entire basis here is that here is your chain of amino acid that you're considering and this is you this is the human and they actually have one like a very similar um graphic in their blog post but we'll draw this ourselves i'll just kind of sort of copy it and what you do is you go and look into your database right this this is the amino acid sequence and each amino acid can actually be abbreviated by a single letter uh since they're 21 and luckily the uh the holy alphabet creators have given us well 26 so that fits um so each of these can be done by like s y c m d and so on can be then you go look into your database and your database is of sort of all of life and you go look for similar sequences and there are tools that you can very quickly see through databases and get out similar sequences to yours and that those are sequences that are overlapping in amino acid sequence right so you could find up in the fish this is an alpha this is not a fish in the fish uh there is a similar sequence right here in the i am like this is okay in the whatever this is this might be a horsey no this is not a horse let's make an alligator out of this so in the alligator raw does the alligator have um there might be a sequence and so you get the point my drawing skills are uh to be criticized in another video so you search for all of these similar sequences just by by amino acid sequence and from the correlations you can derive something for example i've already told you that sometimes you can substitute an amino acid uh and the the sort of function of the protein isn't really affected and this may be what you can see right here so in the human this is maybe a d but or sorry maybe this here it's a c um but in the in the let's call this an m in the fish it's a c2 but you know in the alligator it's a p and in the cockroach it's k and so on um you can see that maybe if the alignment is good right this is sort of from the same protein or from a protein that does maybe the same thing in these uh life forms because life is continuous often these things are preserved or slightly modified um so here there are variations that happen in life right mutations variations and so we can safely maybe assume that you know a k whether there's a k or a p or a c in this particular point it doesn't really matter the shape doesn't seem to be too affected okay that's so that's step one and now so this might be this this protein this amino acid right here you see whether it's this chain or whether it's uh this chain maybe doesn't really matter for the function of the protein however if you look at two proteins that are in contact what needs to happen so if my protein here has this chain and the other protein has has sort of is in contact that means there is like a chemical interaction between the two okay so now if a mutation happens if a mutation happens and the protein is still functioning the same way but the mutation happened let's say it's now um this right here that must mean the shape is still the same sort of and that must mean that probably if one of them changed the other one probably changed sort of analogously at the same time because structure is preserved function is preserved so structure is preserved and since structure is determined by chemical interactions one of the parts changed that means probably the other part has changed as well so maybe now this is sort of this chain right here so what you would expect to see in the statistics is that if one changes the other one changes accordingly so there can be variations right there can be mutations but if the mutation happens in one of them a corresponding mutation should happen in the other one as well otherwise the protein would be non-functional and the organism would sort of die um not always but you know this is kind of a statistics game and this is what you see here like the fish has an s like the human and an h right here but the alligator has an f and a w right here and then in the cockroach you see the s and the h again and so on and here down here you see the f and the w again and this is an indication that this the correlation here is an indication that these two things might be in contact with each other now there have been systems for example in this paper right here that directly go from these statistics to contact predictions and so on alpha fold simply takes in this stuff as features so this right here all of this there can be i think they derive 488 features from this so this goes down here i think they say it again as i said this is confused like here article stops references article starts again thanks and they like say almost the same things it's just a little bit more detail but it's not longer so here they derive 484 features from this multiple sequence alignment for each residue pair right so in our big tensor right here right here each dot each thing right here already now has 400 so each one of these already has 484 features and then some more right this is already this is from the msa but then more features so they incorporate lots of features right here where we at here incorporate lots of features in addition we provide the network with features that explicitly represent gaps and deletions um they also represent scalar features and so on so here you can see they have scalar features sequence length features amino acid type profiles hh blitz profiles these are all sort of these comp bio tools these genetic tools um and so on you also have sequence length features uh these are these 484 features and so on so these are all akin there are some positional one of these acts as positional encodings and so on so lots of features input convolutional network output the distance matrix and that's that right so there you have it the inputs the distance matrix from the distance matrix you can run gradient descent to get the protein structure at inference time and they make some pretty cool points not not only do they compare the distance matrices but they here is the not only the single prediction for the distance but they of course output a probability distribution they bin all of these distances the output of probability distribution and you can see that the black line in these histograms so this is this is for a particular thing this is for this this red line uh this red row right here it's the extraction so it's for one of the amino acid the distribution of probabilities of distance bins with each of the other ones so this is number 29 and we look at the distance between number 29 and 1 2 3 and so on the black line represent the represents i think eight angstroms which is generally considered the barrier for being in contact or not being in contact um and here it's colored in blue if not in contact and in green if in contact and the red bar represents the true distance and you can see this is pretty accurate so whenever the network predicts blue usually the uh red line is on the right of the black line and if the network predicts uh no sorry uh this green and blue is the ground truth so whenever it's blue the network's distribution is usually shifted towards the right and whenever it's green the network's distribution is shifted towards the left there are some failure cases as you can see right here the network predicts a higher distance than the um than the the truth right you can also see what's pretty interesting is that the most accurate predictions sort of the uh highest confidence the smallest variation in distribution are around here which is exactly around so 29 would be in the middle right here and that's where you find the most accurate predictions of course since local local distances are much more easier and then as you go further away you get less sure and this is a cool thing so here you can see model prediction versus true distance fits fairly well but you can also see that here they plot the standard deviation of their prediction and you can see that the um the means are very close but the higher the sort of standard deviation um the less sure the model is so there there seems to be a there seems to be like a built-in confidence metric right um so you can see the distance error it makes here are bigger and also its standard deviation is bigger at the same time which means that you can sort of look at the standard deviation of this distribution right here and that is an estimate for how sure how confident the model is in its prediction and apparently that's something that in alpha fall 2 the the model relies on upon very very crucially so here you these are just on the bottom you see one of these residual blocks here more distance matrices they do a lot of analysis in this article which is pretty cool so you can go into it uh fairly far they also have look at what the network pays attention to and it makes a lot of sense like it pays attention to kind of these these helices and then these interactions between the helices and the parts where it's in close contact with and so on but now we want to go into alpha fold two alpha fold two now the what we have isn't much we have this graphic right here um which is also in the article it's probably better we go to the blog post the blog post is like a fluff piece saying we they are going to publish a paper but of course they don't have it yet because we've just gotten the results um yeah they have they have these these cool these videos were like ah so good um as i said i've like there's so many twitter threads with i'm not usually up for the hype but this is the best thing and so on and everyone's everyone's hyping and i thought is it really up to me to be the grumpy one here um but then i couldn't find anything to be grumpy about so uh this is what we what we get um let's see it's it's deep mind um i expect them to not fully maybe release the code maybe they will but um in alpha fold one they've released like half the code which is already pretty cool so there are open source implementations based on that so again nothing to be grumpy about all right so what can we what can we say they say a folded a folded protein can be thought of as a spatial graph and then this is kind of a new word they introduced but ultimately it's simply uh this distance matrix that we've seen before is a representation of that spatial graph right it's simply a graph of nodes and the edges say whether or not they're in contact or respectively how far they are apart where the residues are nodes and edges connect the residues in close proximity this graph is important for understanding the physical interactions within proteins as well as their evolutionary history for the latest version of alpha fold used at casp14 that's this challenge we created an attention-based neural network system trained end-to-end that attempts to interpret the structure of this graph while reasoning over the implicit graph that it's building uh i look this it sound like this this is fluff maybe i don't know but this here attention based okay so i'm going to guess uh for sure that they've replaced this convnet um with in with a transformer style with an attention um attention layer or multiple attention layers they say it uses evolutionary evolutionarily related sequences multiple sequence alignment and the representation of amino acid residue pairs to refine this graph this is um this is what we've already seen so use these other sequences plus like a lot of stats that you can gather from the data sets on amino acid pairs in order to develop this this graph and the graph is distance uh the distance matrix or other things we'll see in just a second they say by iterating this process the system develops strong predictions of the underlying physical structure of the protein and is able to determine highly accurate structures in a matter of days additionally alpha phil can predict which parts of each predicted protein structure are reliable using an internal confidence measure again this is something that we've already sort of seen in alpha fold one that there is sort of an internal confidence measure and the part here is they say by iterating this process which could mean that it's no longer just this two stage approach but it could be an actually fully cycling approach that sort of goes back to the neural network to refine the structure that it's building with the gradient descent procedure it's entirely possible so this is the graphic of alpha fold two you can see at the very beginning you have protein sequence and um [Music] at first you have this embed and outer embed and outer sum which i'm going to guess this is just kind of features for pairs or individual amino acids this this is correlation statistics from your data set it can be you know chemical properties whatever it just a bunch of features that you can attach to each of these amino acids in the sequence right um the other path here is this genetic search and embed so this is what we've already seen with the ms same thing i saw i told you they have the same graphic so there's human there's fishy there's rabbit and you simply search for sequences in your database it could even be from other humans right uh that are similar and from that from those you can also derive features so here is where i'm a bit confused you can see they build up this again this square matrix right here i mean this um it already screamed the tension before right so i'm going to guess they no longer limit themselves to the maybe maybe to the 64 by 64 maybe they do something bigger uh maybe they use local attention who knows but i'm going to guess they use attention to um and these this here is simply given by an attention layer of some sort to go into the next uh to just uh this is basically i would guess this is a big transformer right here the interesting part is that it appears to interact um much like much like the original transformer maybe encoder decoder here they pass information around so this top thing isn't amino acid sequence to amino acid sequence like to itself but it appears to be a matrix that you build up between the amino acid sequence and these sequences you built so i would guess that they are no longer let's say happy with simply inputting the features of these algorithms that go over these other sequences but now they also want to sort of uh put these features through through steps of transformations so again i would guess this is an attention layer and how can we interpret this matrix as you can see this matrix relates individual amino acids in the sequence to other species so i would guess that this square here represents something like um how important is this particular location in the chain which is a purple thingy in the human how important is that in the in the in the chicken or how related is that to the chicken at that particular position or as a whole i don't know probably deepmind doesn't know like they probably just chip these features in here right and then they just ship it through transformers they pass information around i don't know whether it's just in this direction and then in this direction or whether there's like an arrow right here conceivably but in any case it seems like they've replaced what was a convnet so no longer friends with confnet new best friend is transformer and then at the end you see what they get out is these pairwise distances again now it's also not really clear because i would expect maybe an arrow going like this if they again use these pairwise distances to predict the structure i don't know okay or if that's just a side output i would guess they still actually use the pairwise distances and the confidence score again you can um it might be something very similar that we saw again being the sort of standard deviation on the predicted distances but they could also refine that and then the last thing is i don't know if this iterative process is uh simply referring to there being multiple layers of this attention and passing around so the passing around will simply be like um you stack the representations on top of each other i don't know if this is the iterative procedure or if there is actually like the structure module actually sort of um builds the structure and then goes back and then you consult the neural network again and then you build some more of the structure and so on i can't tell right now it's quite conceivable that they they do like that the search here is not only gradient descent but is actually informed by the neural network so you can sort of go back and refine though i don't know there doesn't seem to be any features in the neural networks that would represent um that would represent whatever you could read from a partially built 3d model so you know the boring guess is that the part 2 is very is is a lot of the same but there could also be a substantial improvements in that part all right i hope this was um this was sort of a good overview um so as i said the paper isn't out yet if you want to cite this i guess you can you can refer to the blog post and here they say until we've published a paper on this work please cite high accuracy protein structure prediction using deep learning by these people i just want to highlight um shout out to to anna who was educated right here she was an intern so in a way i'm actually saying that this is my discovery and i take full responsibility for it you're welcome world shout out to anna um very nice job good work good work to all of these people and yeah i hope that was enough uh if i got something horribly wrong please tell me in the comments and share the video out if you liked it other than that have fun bye
Info
Channel: Yannic Kilcher
Views: 164,363
Rating: 4.9224682 out of 5
Keywords: deep learning, machine learning, arxiv, explained, neural networks, ai, artificial intelligence, paper, google, deepmind, deep mind, alphago, alphazero, alphafold, protein, dna, rna, folding, casp, casp14, alphafold 2, blog, hassabis, biology, translation, amino acid, transformer, convolution, residual, spatial graph, refine, gradient descent, van der waals, torsion angles, google ai, google brain, nobel prize, msa, multiple sequence alignment, covariation, evolution, contact prediction, distogram
Id: B9PL__gVxLI
Channel Id: undefined
Length: 54min 37sec (3277 seconds)
Published: Tue Dec 01 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.