Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér. Oh my goodness. This work is history in the making. Today we are going to have a look at AlphaFold,
perhaps one of the most important papers of the last few years. And you will see that nothing that came before
even comes close to it, and that it truly is a gift to humanity. So, what is AlphaFold? AlphaFold is an AI that is capable of solving
protein structure prediction, which we will refer to as protein folding. Okay…but what is a protein and why does
it need folding? A protein is a string of amino acids, these
are the building blocks of life. This is what goes in, which in reality, has
a 3D structure. And that is protein folding. Letters go in, a 3D object comes out. This is hard. How hard exactly? Well, let’s compare it to DeepMind’s amazing
previous projects, and we’ll see that none of these projects even come close in difficulty. For instance, DeepMind’s previous AI learned
to play chess. Now, why does this matter, as we already have
Deep Blue, which is a chess computer that can play at the very least as well as Kasparov
did, and it was built in 1995. So, why is chess interesting? The space of possible moves is huge. And Deep Blue in 1995 was not an AI in the
strictest sense, but a handcrafted technique. This means that it can play chess and that’s
it. One algorithm, one game. If you want a different game, you write a
different algorithm. And, yes, that is the key difference. DeepMind’s Chess AI is a general learning
algorithm that can learn many games, for instance, Japanese chess, Shogi too. One algorithm, many games. And, yes, chess is hard. But these days, the AI can manage. Then, Go is the next level. This is not just hard, it is really hard. The space of possible moves is significantly
bigger, and we can’t just evaluate all the long-term effects of our moves, it is even
more hopeless than chess, and that’s often why people say that this game requires some
sort of intuition to play. But DeepMind’s AI solved that too and beat
the world champion Go player 4 to 1 in a huge media event. The AI can still manage. Now, get this, if Chess is hard, and Go is
very hard, then protein folding is sinfully difficult. Once again, a string of text encoding the
amino acids go in, a 3D structure comes out. Why is this hard? Why not just try every possible 3D structure
and see what sticks? Well, not quite. The search space for this problem is still
stupendously large, perhaps not as big as playing a continuous strategy game like Starcraft
2, but the search here is much less forgiving. Also, we don’t have access to a perfect
scoring function, so it is very difficult to define what exactly should be learned. In a strategy game, a win is a win, but for
proteins, nature doesn’t really tell us what it is up to when creating these structures. Thus, DeepMind did very well in Chess and
Go, and Starcraft too, and challenging as they are, they are not even close to being
as challenging as protein folding. Not even close. To demonstrate that, look, this is CASP. I've heard DeepMind CEO Demis Hassabis call
it the Olympics of protein folding. If you look at how teams of scientists prepare
for this event, you will probably agree that yes, this is indeed the Olympics of protein
folding. At about a score of 90, we can think of protein
folding as a mostly solved problem. No need to worry about definitions though,
look, we are not even close to 90. And it gets even worse, look. This GDT score means the global distance test,
this is a measure of similarity between the predicted and the real protein structure. And, wait a second. What? The results are not only not too good, but
they appear to get worse over time. Is that true? What is going on here? Well, there is an explanation. The competition gets a little harder over
time, so even flat results mean that there is a little improvement over time. And now, hold on to your papers, and let’s
look at the results from DeepMind’s AI-based solution, AlphaFold. Wow, now we’re talking. Look at that! The competition gets harder, and it is not
only flat, but can that be? It is even better than the previous methods. But, we are not done here, no-no, not even
close. If you have been holding on to your papers,
now, squeeze that paper, because what you see here is old news. Because only two years later Alphafold 2 appeared. And just look at that. It came in guns blazing. So much so that the result is…I can’t
believe it! It is around the 90 mark. My goodness, that is history in the making. Yes, this is the place on the internet where
we get unreasonably excited by a large blue bar. Welcome to Two Minute Papers! But what does this really mean? Well, in absolute terms, AlphaFold 2 is considered
to be about three times better than previous solutions. And all that in just two years. That is a miracle right in front of our eyes. Now, let’s pop the hood, and see what is
inside this AI, and...hmm! Look at all these elements in the system that
make this happen. So, where do we start? Which of these is the most important? What is the key? Well, everything…and nothing. I will explain this in a moment. That does not sound very enlightening, so,
what is going on? DeepMind ran a detailed ablation study on
what mattered and the result is the following: everything mattered. Look. With few exceptions, every part adds its own
little piece to the final result, but, none of these techniques are a silver bullet. But to understand a bit more about what is
going on here, let’s look at three things. One, AlphaFold 2 is an end to end network
that can perform iterative refinement. What do these mean? What this means is that everything needed
to solve the task is learned by the network, and that it starts out from a rough initial
guess, and then, it gradually improves it. You see this process here, and it truly is
a sight to behold. Two, it uses an attention-based model. What does that mean? Well, look! This is a convolutional neural network. This is wired in a way that information flows
to neighboring neurons. This is great for image recognition, because
usually, the required information is located nearby. For instance, let’s imagine that we wish
to train a neural network that can recognize a dog. What do we need to look at? Floppy ears, black snout, fur, okay, we’re
good, we can conclude that we have a dog here. Now, have you noticed? Yes, all of this information is located
nearby. Therefore a convolutional neural network is
expected to do really well at that. However, check this out. This is a transformer, which is an attention-based
model. Here, information does not flow between neighbors. No sir! Here, information flows everywhere! This has spontaneous connections that are
great for almost anything if we can use them well. For instance, when reading a book, if we are
at page 100, we might need to recall some information from page 1. Transformers are excellent for tasks like
that. They are still quite new, just a few years
old, and are already making breakthroughs. So, why use them for protein folding? Well, things that are 200 amino acids apart
in the text description can still be next to each other in the 3-D space. Yes, now we know that for that, we need attention
networks, for instance, a transformer. These are seeing a great deal of use these
days, for instance, Tesla also uses them for training their self-driving cars. Yes, so these things mattered. But so many other things did too. Now, I mentioned that the key is everything…and
nothing. What does that mean? Well, look here. Apart from a couple examples, there is no
silver bullet here. Every single one of these improvements bump
the score a little bit. But all of them are needed for the breakthrough. Now, one of the important elements is also
adding physics knowledge. How do you do that? Typically the answer is that you don’t. When we design a handcrafted technique, we
write the knowledge into an algorithm by hand. For instance, in chess, there are a bunch
of well-known openings for the algorithm to consider. For protein folding, we can tell the algorithm
that if you see this structure, it typically bends this way. Or we can also show it common protein templates,
kind of like openings for chess. We can add all this valuable expertise to
a handcrafted technique. Now, we noted that scientists at DeepMind
decided to use an end to end learning system. I would like to unpack that for a moment,
because this design decision is not trivial at all. In fact, in a moment, I bet you will think
it’s flat out counterintuitive. Let me explain. If we are a physics simulation researcher
and we have a physics simulation problem, we take our physics knowledge, and write a
computer program to make use of that knowledge. For instance, here, you see this being used
to great effect, so much so that what you see here is not reality, but a physics simulation. All handcrafted. Clearly, using this concept, we can see that
human ingenuity goes very far, and we can write super powerful programs. Or, we can do end to end learning, where,
surprisingly, we don’t write our knowledge into the algorithm at all. We give it training data instead, and let
the AI build up its own knowledge base from that. And AlphaFold is an end to end learning project,
so almost everything is learned. Almost. And, one of the great challenges of this project
was to infuse the AI with physics knowledge without impacting the learning. That is super hard. So, training huh? How long does this take? Well, get this, DeepMind can train this incredible
folding AI in as little as 2 weeks. Why is two weeks little? Well, after this step is done, the AI can
be given a new input and will be able to create this 3D structure in about a minute. And, we can then reuse this trained neural
network for as long as we wish. Whew, so… this is a lot of trouble to fold
these proteins. So what is all this good for? The list of applications is very impressive,
I’ll give you just a small subset of them that I really liked: it helps with us better
understand the human body, or create better medicine against malaria and many other diseases,
develop more healthy food, or develop enzymes to break down plastic waste, and more. And that’s just the start. Well, you are probably asking, Károly, you
keep saying that this is a gift to humanity. So…why is it a gift to humanity? Well, here comes the best part. A little after publishing the paper, DeepMind
made these 3D structure predictions available for free for everyone. For instance, they have made their human protein
predictions public. Beyond that, they already have made their
predictions public for yeast, important pathogens, crop species, and more. And I have already seen followup works on
how to use this for developing new drugs. What a time to be alive! Now note that this is but one step in a thousand-step
journey. But one important step nonetheless. And, I would like to send huge congratulations
to DeepMind. Something like this costs a ton to develop,
and note that it is not easy, or maybe not even possible to immediately make a product
out of this and monetize it. This truly is a gift to humanity, and a project
like this can only emerge from proper long term thinking that focuses on what matters
in the long term. Not just thinking about what is right now. Bravo. Now, of course, not even AlphaFold 2 is perfect. For instance, it's not always very confident
about its own solutions. It also performs poorly in antibody interactions. Both of these are subject to intense scrutiny
and followup papers are already appearing in these directions. Now, one last thing. Why does this video exist? I got a lot of questions from you asking why
I made no video on AlphaFold. Well, protein folding is a highly multidisciplinary
problem, which, beyond machine learning, requires tons of knowledge in biology, physics, engineering. And my answer was that I don’t feel qualified
to speak about this project, so I better not. However, something has changed. What has changed? Well, now I had the help of someone who is
very qualified. As qualified as it gets, because it is the
one and only John Jumper, the first author of the paper, who kindly agreed to review
the contents of this video to make sure that I did not mess up too badly. Thus, I would like to send a big thank you
to John, his team, and DeepMind, for creating AlphaFold, and helping this video to come
into existence. It came late, so we missed out on a ton of
views, but that doesn't matter. What matters is that you get an easy to understand
and accurate description of AlphaFold. Thank you so much for your patience! Thanks for watching and for your generous
support, and I'll see you next time!