In the 1950s, scientists programmed a first-generation
computer to learn using the same rules as the human brain. This model is called a neural network. It’s made up of many basic units that compute
by taking inputs from other units and passing them on as simple calculations. The answers are recorded by designated output
units. Pathways strengthen, and the model learns,
when many different inputs result in the same output. Today, advanced versions of these models,
called deep neural networks, are the most successful AI ever created. They power image and speech recognition and
recognize patterns in massive sets of data, making astonishing predictions about the future. But there’s a dark mystery here: We have
no idea how deep neural networks work. What happens inside their billions of hidden
layers that allows them to converge to a solution? Now, for the first time ever, a group of researchers
has found a mathematical key that might just open this black box. Historically, a lot of methods in machine
learning were designed to have simple properties, where the model designer was guaranteed that
they would obtain a certain solution if they used a particular algorithm or model. Deep neural networks move away from that paradigm. The model designer chooses how many units
they would like to have in each one of the hidden layers. That’s often referred to as the width of
the neural network. Yasaman Bahri and her colleagues at Google’s
Brain Team hoped to mathematically simplify a deep neural network by considering an extreme
case: They took the networks’ width to be infinite. One of the goals was to understand what happens
in that limit of infinitely wide, deep neural networks to see if we could say anything concrete
about it. Incredibly, they could. Bahri mathematically reduced those infinite-width
networks to something simpler called kernel machines, algorithms with a history going
back to the 19th century mathematician Carl Friedrich Gauss. A kernel method captures a richness that is
similar to a deep neural network in terms of its functional dependence on the input,
but it's simpler than a deep neural network because it's linear and its parameters and
neural networks in general are non-linear. Kernel machines find patterns in data by projecting
the data into extremely high dimensions. They map each point on a low-dimensional data
set to a point in a higher dimension, and then use the data point’s characteristics
to identify a class it belongs to. They do this classification using a geometric
object called a hyperplane. Kernel machines can compute infinite-dimensional
data while staying firmly in the realm of lower dimensional space. Bahri and her team were able to show that
kernel methods are mathematically equivalent to an idealized version of a deep net–the
first major step to cracking open the black box of deep learning. It's very rare to have an exact equivalence
between two things, in particular if that other thing is exactly solvable. You can write down the expressions mathematically. From my perspective, that's quite appealing. If this equivalence can be extended beyond
idealized neural networks, it may explain how practical deep neural networks achieve
their astonishing results. To describe finite width networks, more work
is needed, but at least if you have an exactly solvable model where you have that full handle,
you can then start to be systematic about what's missing. And so from my mind, that gives you a starting
point. There is a great debate happening in the world
of set theory, the study of collections of numbers and other mathematical objects. It’s about the nature of infinity – or,
rather, infinities. The remarkable thing about infinities is that
they come in different sizes. Take two infinite sets of numbers. If every number from one set can be paired
to a number from another, the sets are said to be bijective, or the same size. So it's not just the case that all infinite
sets are bijective with each other, that there's a one-to-one correspondence between them. You have actually lots of examples of sets
such that one is strictly smaller than the other. These questions about the sizes of infinities
go back nearly 150 years, when the German mathematician Georg Cantor rocked the mathematical
world by discovering something that seems counterintuitive: The infinite set of real
numbers, which includes every point on the number line, outnumbers natural numbers – even
though there are infinitely many of both. After all, you can try to pair up these sets,
but you would have skipped over an infinite amount of real numbers between them. In other words, these infinities are not equal. The smaller one’s size, or cardinality,
was classified as aleph-zero, and the next biggest size is aleph-one. Cantor conjectured that this aleph-one is
exactly the size of the continuum of real numbers. In other words, there are no sizes of infinity
between the sets of natural and real numbers. This came to be known as the continuum hypothesis. But he could never prove it. The axioms, or basic foundations of mathematics,
simply didn’t govern things like Cantor’s infinity problem. But this year, set theorist David Aspero and
his longtime collaborator Ralf Schindler got closer to finding the answer anyway. We had the right scenario in place about 10
years ago, but there was a missing ingredient. And then it just came, just by chance, quite
randomly. The breakthrough came when Asperó used a
technique, known as forcing, to create a mathematical object called a witness. He used this witness to verify that an extremely
powerful axiom, Martin’s maximum plus plus, actually implies a rival axiom, star. By unifying the two rival axioms, Asperó
and Schindler showed that they are both likely true – implying that an extra size of infinity
actually does sit between the sets of natural and real numbers. This would prove that there are more reals
than Cantor had thought, finally bringing closure to the 150-year-old mystery and offering
a coherent alternative to the continuum hypothesis. But this is not even close to the end of the
story. Newer and stronger axioms are already challenging
this result and a battle is underway to decide which side is right. The final chapter on the actual number of
real numbers – the true size of the continuum – is yet to be written. This is a Liouville field–or, at least,
an idea of it. It’s a wildly chaotic mathematical surface
where the height of every point is chosen at random. Forty years ago, the theoretical physicist
Alexander Polyakov found a striking use for these fields: as a model of quantum physics. Polyakov intuited that they could do this
by unlocking the behavior of theoretical objects called strings, and building a simplified
model of quantum gravity in two dimensions. It can be seen as a toy model for higher-dimensional
cases, but it can also be seen as string theory, because string theory in some sense is about
two- dimensional surfaces. But Polyakov’s formula for understanding
the Liouville field stopped tantalizingly short of being rigorous. Although he attempted to explain it using
a path integral developed by Richard Feynman, the Liouville field resisted being described
in this way. Around the year 2013, we started looking at
Polyakov’s path integral and according to what physicists were saying, if we could make
sense of this path integral, we could directly construct in the continuum quantum gravity. So we took the path integral and we said,
“Okay, let's give it a shot. We're going to try to define it.” Mathematician Vincent Vargas and his colleagues
set out to precisely describe Polyakov’s path integral using a different approach:
probability theory. They began by transforming the Liouville field
into a far milder object called a Gaussian free field. It's kind of this very rough landscape where
there are infinite spikes everywhere. From the point of view of physics, the Gaussian
free field is a boring theory. It's a trivial theory where everything is
computable straight away. And what we realized and which came as a surprise,
I think, to the physics community, is that we could express anything that you naturally
want to compute on this theory just in terms of something on the Gaussian free field. There were other leads to build on as well. In 1984, as a potential workaround to his
failed path integral, Polyakov began trying to develop a technique called the bootstrap
— a mathematical ladder that gradually builds up to a full, complex representation of a
field. By chance, two unrelated pairs of physicists
in the 1990s built on Polyakov’s bootstrap and managed to completely solve the Liouville
field. They called their formula “DOZZ." Even though it seemed to work, it was a miraculously
lucky guess, and they couldn’t prove it. This formula, which comes out of a black box–I
mean, it's black magic–well, it actually has a probabilistic meaning. So what we did is show that our path integral
probability construction is equivalent, completely equivalent to that bootstrap construction. Last year, they unveiled a new and improved
version of Polyakov’s path integral, defined in terms of the Gaussian free field. The work also explains the mysterious origins
of the DOZZ formula. We're mapping probability theory to the bootstrap,
or, if I were just talking on the mathematical side, we’re mapping probability theory to
representation theory. And with that, they proved that the Liouville
field models quantum gravity, exactly as Polyakov thought it would 40 years ago. By bridging these two fields of math which
are really distinct, a priori it enables us to compute things that physicists don't know
how to compute.
The past year of research in 2021 has paved the way for more advancements in 2022 and beyond. Researchers found a way to idealize deep neural networks using kernel machines—an important step toward opening these black boxes. There were major developments toward an answer about the nature of infinity. And a mathematician finally managed to model quantum gravity.