How a Computer Broke a 50-Year Math Record

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
There is an enigmatic and powerful mathematical operation at work inside everything from computer graphics and neural networks, to quantum physics. It's simple enough for high school students to grasp yet so complex that even seasoned mathematicians haven't mastered it. This operation is called matrix multiplication. Matrix multiplication is a very fundamental operation in mathematics that appears in many computations in engineering and in physics. A matrix is a two dimensional array of numbers on which you can perform operations like addition and multiplication. Researchers have long sought more efficient ways to multiply matrices together. So if you even just make that a little bit faster, larger problems come into reach. Where for now, we would say that's too big to be computable in reasonable time. However, actually finding faster matrix multiplication methods is a huge challenge. But thanks to a new tool, researchers have finally broken a long standing matrix multiplication record, one that's more than 50 years old. What's their secret weapon? Students of linear algebra are taught a method for multiplying matrices based on a centuries old algorithm. It goes like this. Multiply elements from the first row of matrix A and the first column of matrix B and add them to get the first element of matrix C. Then repeat for the first row of matrix A and the second column of matrix B, and add them for the second element in matrix C. And so on. Multiplying two two by two matrices this way, takes eight multiplication steps. Multiplying any two N by N matrices with the standard algorithm takes N-cubed steps. Which is why applying this method to pairs of larger matrices quickly becomes unwieldy. If you take a matrix that is twice as big, then you have to have a computation time that is eight times more. So you can imagine, if you doubled it a couple of times, then you take eight times more a couple of times, and you will very soon reach the limits of what a computer can do. Enter Volker Strassen, a German mathematician known for his work analyzing algorithms. In 1969, he discovered a new algorithm to multiply two by two matrices that requires only seven multiplication steps. Going from eight down to seven multiplications may seem trivial, and the new addition steps look more complicated. But Strassen's algorithm offers dramatic computational savings for larger matrices. That's because when multiplying large matrices, they can be broken down into smaller ones. For example, an eight by eight matrix can reduce to a series of nested two by two matrices. So Strassen's savings applied to these smaller matrix multiplications propagate over and over. Applying Strassen's to an eight by eight matrix results in a third fewer multiplication steps compared to the standard algorithm. For very large matrices, these savings vastly outweigh the computation costs of the extra additions. A year after Strassen invented his algorithm, IBM researcher Shmuel Winograd proved it was impossible to use six or fewer multiplications to multiply two by two matrices, thus also proving that Strassens, with its seven multiplications is the best solution. For half a century the most efficient method known for multiplying two matrices of any reasonable size was to break them down and apply Strassen's algorithm. That was until October 2022, a new algorithm was revealed that beat Strassen's, specifically for multiplying two four by four matrices where the elements are only zero or one. This new algorithm made it possible to multiply large matrices even faster by breaking them into four by four matrices instead of two by two's. So who or what was behind This breakthrough? This new algorithm was discovered by Google's artificial intelligence research lab, DeepMind. For more than a decade, DeepMind has garnered attention for training AI systems to master a host of games, everything from Atari Pong to chess. Then, in 2016, DeepMind's AlphaGo achieved what was considered impossible at the time, it defeated the top ranked human Go player, Lee Sedol , in a best of five match. This victory shattered the limited notion of what's possible for computers to achieve. DeepMind then set its sights on a problem even more challenging than Go. I was like very surprised that even for very small cases, we don't even know what's the optimal way of doing matrix multiplication. And at some point, we realized that this is actually a very good fit for machine learning techniques To tackle matrix multiplication. DeepMind started with an algorithm descended from AlphaGo called AlphaTensor. AlphaTensor is built on a reinforcement learning algorithm called AlphaZero. So what one needs to do is to go really beyond the AlphaZero reinforcement learning algorithm and to tackle this huge search space and to develop techniques to find this, these needles in a very, very large haystack. AlphaTensor isn't the first computer program to assist with mathematical research. In 1976, two mathematicians proved what's called the Four Color Theorem using a computer. The theorem states, you only need four colors to fill in any map so no neighboring regions match. The pair verified their proof by processing all 1936 required cases requiring more than 1000 hours of computing time. Back then the larger mathematical community was not prepared to cede logical reasoning to a machine. However, the field has since come a long way. AlphaTensor was trained with a technique called reinforcement learning, which is kind of like playing a game. Reinforcement Learning strategically penalizes and rewards an AI system as it experiments with different ways to achieve its given task, driving the program towards an optimal solution. But what kind of game should AlphaTensor play in search of more efficient matrix multiplication algorithms? This is where the term tensor in AlphaTensor comes into play. A tensor is just an array of numbers with any number of dimensions. Vectors are 1D tensors, and matrices are 2D tensors. The process of multiplying any two matrices of a given size can be described by a single unique 3D tensor. For example, when multiplying any two, two by two matrices, we can build the corresponding 3D tensor. Each dimension of this cube represents one of the matrices, each element in the cube can be one zero or negative one. The matrix product C is created by combining elements from matrices A and B, like this. And so on. Until you have the full matrix multiplication tensor. Now, you can use a process called tensor decomposition to break down this 3D tensor into building blocks. Similar to taking apart a cube puzzle. One natural way to break tensors down is into what's called rank-1 tensors, which are just products of vectors. The trick is each rank-1 tensor here describes a multiplication step in a matrix multiplication algorithm. For example, this rank-1 tensor represents the first multiplication step in the standard algorithm: A1 times B1. The next rank-1 tensor represents A2 times B3. Adding these two rank one tensors yields the first element in the product C1. Here are the next two rank-1 tensors, representing the multiplications, A1 times B2 and A2 times B4, which form C2. Eventually, the entire standard algorithm with its eight multiplication steps is represented by decomposing the 3D tensor into eight rank-1 tensors. These all add back up into the original 3D tensor. But it's possible to decompose a 3D tensor in different ways. Strassen's seven multiplication steps for the same two by two matrix multiplication looks like this. These rank-1 tensors are more complex. And this is still a full decomposition, but in fewer steps, which add backup to the original tensor. So the fewer rank-1 tensors, you use to fully decompose a 3D tensor, the fewer multiplication steps used in the tensor's corresponding matrix multiplication. DeepMind's construction of a single-player game for AlphaTensor to play, and learn from was key. Find an algorithm for this matrix multiplication that requires the fewest multiplication steps possible... is a vague request. But it becomes a clearly defined computer task once it's formulated as: decompose this 3D tensor using as few unique rank-1 tensors as possible, It's really hard to describe what, what the search space looks like. In the particular case of matrix multiplication, that's, that's quite, it's quite convenient to formulate it in that space, because then we can deploy our search techniques and our machine learning techniques in order to search in that very, very large, but formalizable search spaces. AlphaTensor's play was simple. It was programmed to guess rank-1 tensors to subtract from the original 3D tensor, to decompose it down to zero. The fewer rank one tensors it used, the more rewards that got. Each rank-1 tensor that you remove from from the 3D tensor, is has a cost, has a cost of let's say one. And so we want to figure out what's the way of achieving the goal with the fewest penalties. And so that's what the system is trying to learn how to do it's learning to estimate, well when I'm in this kind of configuration, roughly how many penalties do I think I'm going to incur before I get to the goal? But tensor decomposition is not an easy game to master. For even a three by three matrix multiplication with only elements zero or one, the number of possible tensor decompositions exceeds the number of atoms in the universe. Still, over the course of its training, AlphaTensor started to home in on patterns to decompose the tensor efficiently. Within minutes it rediscovered Strassen's algorithm. Then the program went even further. It beat Strassen's algorithm for multiplying two, four by four matrices in modulo-2, where the elements are only zero or one, breaking the 50 year record. Instead of the standard algorithms 64 multiplication steps or Strassen's 49, AlphaTensor's algorithm used only 47 multiplications. AlphaTensor also discovered thousands of other new fast algorithms, including ones for five by five matrices in modulo-2 . So, will programs like AlphaTensor, churning away in server rooms pulling out new mathematical discoveries from lines of code make mathematicians obsolete? Ultimately, I think this will not replace like the mathematicians or, or anything like that. This I think provides a good tool that can help mathematicians find new results and guide their intuition. That's exactly what happened just days after the AlphaTensor results were first published in the journal Nature. Two mathematicians in Austria, Manuel, Kauers and Jakob Moosbauer used AlphaTensors 96-step, five by five matrix multiplication algorithm as inspiration to push even further. And then Jakob suggested we just take the algorithm that the AlphaTensor people found as a starting point and see whether there's something in the neighborhood of this, and we could have an additional drop. And indeed, that that was the case. So it was a very short computation for, for our new mechanism that was able to reduce the 96 to 95. And this we then published in the arxiv paper. The right way to look at this is, as a fantastic example of a collaboration between a particular kind of technology and mathematicians. The true potential for human and artificial intelligence collaboration is a frontier that is only now being fully explored. I don't think people can be made irrelevant by any of this kind of work I think it empowers people to do more.
Info
Channel: Quanta Magazine
Views: 314,954
Rating: undefined out of 5
Keywords: science, quanta, quanta magazine, science video, educational video, biology
Id: fDAPJ7rvcUw
Channel Id: undefined
Length: 13min 0sec (780 seconds)
Published: Mon May 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.