There is an enigmatic and
powerful mathematical operation at work inside everything from
computer graphics and neural networks, to quantum physics. It's simple enough for high school students to grasp yet so complex that even seasoned
mathematicians haven't mastered it. This operation is called matrix multiplication. Matrix multiplication is a very
fundamental operation in mathematics that appears in many
computations in engineering and in physics. A matrix is a two dimensional
array of numbers on which you can perform operations like
addition and multiplication. Researchers have long sought
more efficient ways to multiply matrices together. So if you even just make that a
little bit faster, larger problems come into reach. Where for now, we would say that's too big to be computable in reasonable time. However, actually finding faster
matrix multiplication methods is a huge challenge. But thanks to a new tool,
researchers have finally broken a long standing matrix multiplication record, one that's more than 50 years old. What's their secret weapon? Students of linear algebra are
taught a method for multiplying matrices based on a centuries
old algorithm. It goes like this. Multiply elements from the
first row of matrix A and the first column of matrix B and add them to get the first element of matrix C. Then repeat for the first row of matrix A and the second column of matrix B, and
add them for the second element in matrix C. And so on. Multiplying two two
by two matrices this way, takes eight multiplication steps. Multiplying any two N by N matrices with the standard
algorithm takes N-cubed steps. Which is why applying this
method to pairs of larger matrices quickly becomes
unwieldy. If you take a matrix that is
twice as big, then you have to have a computation time that is
eight times more. So you can imagine, if you doubled it a
couple of times, then you take eight times more a couple of
times, and you will very soon reach the limits of what a
computer can do. Enter Volker Strassen, a German mathematician known for his
work analyzing algorithms. In 1969, he discovered a new algorithm to multiply two by two matrices that requires only seven multiplication steps. Going from eight down to seven multiplications
may seem trivial, and the new addition steps look more complicated. But Strassen's algorithm offers dramatic
computational savings for larger matrices. That's because when multiplying large
matrices, they can be broken down into smaller ones. For example, an eight by eight
matrix can reduce to a series of nested two by two matrices. So Strassen's savings applied to these smaller matrix
multiplications propagate over and over. Applying Strassen's to an eight by eight matrix results in a third fewer multiplication
steps compared to the standard algorithm. For very large matrices, these savings vastly
outweigh the computation costs of the extra additions. A year after Strassen invented his algorithm,
IBM researcher Shmuel Winograd proved it was impossible to use six or fewer multiplications to multiply two by two matrices, thus also proving that Strassens, with its
seven multiplications is the best solution. For half a century the most
efficient method known for multiplying two matrices of any
reasonable size was to break them down and apply Strassen's algorithm. That was until October 2022, a new algorithm
was revealed that beat Strassen's, specifically for multiplying two four by four matrices where the elements are
only zero or one. This new algorithm made it possible to
multiply large matrices even faster by breaking them into four by four matrices
instead of two by two's. So who or what was behind This
breakthrough? This new algorithm was
discovered by Google's artificial intelligence research
lab, DeepMind. For more than a decade, DeepMind has garnered
attention for training AI systems to master a host of games, everything from Atari Pong to chess. Then, in 2016, DeepMind's AlphaGo achieved
what was considered impossible at the time, it defeated the top ranked human Go player, Lee Sedol , in
a best of five match. This victory shattered the limited notion of what's possible for computers to achieve. DeepMind then set its sights on a problem even more challenging than Go. I was like very surprised that
even for very small cases, we don't even know what's the optimal
way of doing matrix multiplication. And at some point, we realized that this is actually a very good fit for machine learning techniques To tackle matrix multiplication.
DeepMind started with an algorithm descended from AlphaGo
called AlphaTensor. AlphaTensor is built on a
reinforcement learning algorithm called AlphaZero. So what one needs to do is to go really beyond the AlphaZero
reinforcement learning algorithm and to tackle this huge search
space and to develop techniques to find this, these needles in a
very, very large haystack. AlphaTensor isn't the first
computer program to assist with mathematical research. In 1976, two mathematicians proved what's called the Four Color Theorem using a computer. The theorem states, you only need four
colors to fill in any map so no neighboring regions match. The pair verified their proof by processing all 1936 required
cases requiring more than 1000 hours of computing time. Back then the larger mathematical community was not prepared to cede logical reasoning to a machine. However, the field has since come a long way. AlphaTensor was trained with a technique called reinforcement learning, which is kind of like
playing a game. Reinforcement Learning strategically penalizes
and rewards an AI system as it experiments with different ways
to achieve its given task, driving the program towards an
optimal solution. But what kind of game should AlphaTensor play
in search of more efficient matrix multiplication algorithms? This is where the term tensor in
AlphaTensor comes into play. A tensor is just an array of
numbers with any number of dimensions. Vectors are 1D tensors, and matrices are 2D tensors. The process of multiplying any two matrices of a given size can be described by a single unique 3D tensor. For example, when multiplying any two, two by two matrices, we can build the corresponding 3D tensor. Each dimension of this cube represents one of the matrices, each element in the cube can be one zero or negative one. The matrix product C is created by combining elements
from matrices A and B, like this. And so on. Until you have the full matrix
multiplication tensor. Now, you can use a process
called tensor decomposition to break down this 3D tensor into
building blocks. Similar to taking apart a cube puzzle. One natural way to break tensors down is into what's called
rank-1 tensors, which are just products of vectors. The trick is each rank-1 tensor here describes a multiplication step in a matrix multiplication algorithm. For example, this rank-1 tensor represents the first multiplication step in the
standard algorithm: A1 times B1. The next rank-1 tensor
represents A2 times B3. Adding these two rank one tensors
yields the first element in the product C1. Here are the next two rank-1
tensors, representing the multiplications, A1 times B2 and
A2 times B4, which form C2. Eventually, the entire standard
algorithm with its eight multiplication steps is
represented by decomposing the 3D tensor into eight rank-1 tensors. These all add back up into the original 3D tensor. But it's possible to decompose a
3D tensor in different ways. Strassen's seven multiplication
steps for the same two by two matrix multiplication looks like this. These rank-1 tensors are more complex. And this is still a full decomposition, but in fewer steps, which add backup to the original tensor. So the fewer rank-1 tensors, you use to
fully decompose a 3D tensor, the fewer multiplication steps used
in the tensor's corresponding matrix multiplication. DeepMind's construction of a
single-player game for AlphaTensor to play, and learn
from was key. Find an algorithm for this matrix multiplication
that requires the fewest multiplication steps possible...
is a vague request. But it becomes a clearly defined
computer task once it's formulated as: decompose this 3D tensor using as few unique rank-1 tensors as possible, It's really hard to describe what, what the search space looks like. In the particular
case of matrix multiplication, that's, that's quite, it's quite convenient to formulate it in that space, because then we can
deploy our search techniques and our machine learning techniques
in order to search in that very, very large, but formalizable
search spaces. AlphaTensor's play was simple. It was programmed to guess rank-1 tensors to subtract from
the original 3D tensor, to decompose it down to zero. The fewer rank one tensors it used, the more rewards that got. Each rank-1 tensor that you
remove from from the 3D tensor, is has a cost, has a cost of let's say one. And so we want to figure out what's the way of
achieving the goal with the fewest penalties. And so that's what the system is trying to learn how to do it's learning to estimate, well when I'm in this kind of configuration, roughly how many penalties do I think I'm going to incur before I get to the goal? But tensor decomposition is not
an easy game to master. For even a three by three matrix multiplication with only elements zero or one, the number of possible tensor decompositions exceeds the number of atoms in the universe. Still, over the course of its
training, AlphaTensor started to home in on patterns to decompose
the tensor efficiently. Within minutes it rediscovered
Strassen's algorithm. Then the program went even further. It beat Strassen's algorithm for
multiplying two, four by four matrices in modulo-2, where the
elements are only zero or one, breaking the 50 year record. Instead of the standard algorithms 64 multiplication steps or Strassen's 49, AlphaTensor's algorithm used only 47 multiplications. AlphaTensor also discovered thousands of other new fast algorithms, including ones for five by five matrices in modulo-2 . So, will programs like AlphaTensor, churning away in server rooms pulling out new mathematical discoveries from lines of code make mathematicians obsolete? Ultimately, I think this will not replace like the mathematicians or, or anything like that. This I think provides a good tool that can help
mathematicians find new results and guide their intuition. That's exactly what happened just days after the AlphaTensor results were first published in the journal Nature. Two mathematicians in Austria, Manuel, Kauers and Jakob Moosbauer used AlphaTensors 96-step, five by five matrix multiplication algorithm as inspiration to push even further. And then Jakob suggested we just take the algorithm that the AlphaTensor people found as a starting point and see whether there's something in the
neighborhood of this, and we could have an additional drop. And indeed, that that was the case. So it was a very short computation for, for our new mechanism that was able to reduce the 96 to 95. And this we then published in the arxiv paper. The right way to look at this
is, as a fantastic example of a collaboration between a particular kind of technology and mathematicians. The true potential for human and
artificial intelligence collaboration is a frontier that is only now being fully explored. I don't think people can be made irrelevant by any of this kind of work I think it empowers people to do more.