ChatGPT vs. World's Hardest Exam

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Back in 2019 some AI researchers and  mathematicians came up with an ambitious   challenge, to create an AI that could win a  gold medal at the International Mathematics   Olympiad. It was an ambitious challenge because  winning a gold medal at this event puts you on   the world stage as having one of the best  mathematical minds. Previous winners of the   gold medal include Terence Tao and Maryan  Mirzakhani. The researchers called it the   IMO Grand Challenge and for an AI system  to pass they proposed the following rules. Each proof that the AI producers must  be checkable in 10 minutes which is   approximately the amount of time it takes  a human judge to judge a human solution. The AI only has as much time as  a human competitor which is four   and a half hours for each set of three  problems and the AI must be open source   released publicly and reproducible.  The AI cannot query the internet. As of recording this we have met chat GPT and  GPT-4 was released a couple of months ago. I'm   sure that we'll be continually amazed by what it's  able to do but no AI including ChatGPT has yet won   or even competed in the IMO competition. Bets are  on literally for if and when one might succeed. GPT-4 is paraded around as having already aced  many other exams including the SAT and the biology   Olympiad but it might never ace the IMO because  the thing is that ChatGPT is actually not very   good at math. It's not good at counting or keeping  track of multiple operations because after all it   is a language model that excels at predicting  the next word in a sentence. Math questions on   the SAT can be quite predictable and formulaic  and likely enough similar problems were included   in the training data set that it was able to  get by but the IMO problems are different and   they're designed to test true understanding and  creative problem solving. I'd like to understand   the likelihood of an AI passing this challenge  soon but in order to do that I would first need to   understand the solution to an IMO problem myself  on human terms then I would like to understand how   ChatGPT goes about solving it and then see if  there are any better AI competitors out there. Here's an IMO problem. This is the Olympics for  math so these problems are some of the hardest   out there and can require creative solutions.  This is problem six from the 2022 competition:   Let n be a positive integer. A Nordic square is an  n times n board containing all the integers from 1   to n squared so that each cell contains exactly  one number. Two different cells are considered   adjacent if they share a common side. Every cell  that is adjacent only to cells containing larger   numbers is called a valley. An uphill path is  a sequence of one or more cells such that the   first cell in the sequence is a valley. Each  subsequent cell in the sequence is adjacent to   the previous cell and the numbers written in the  cells in the sequence are an increasing order.   Find as a function of n the smallest possible  number of uphill paths in a Nordic Square. This is an example of a Nordic Square where n  is equal to two. Here one is our valley and we   could have six possible uphill paths; one to  two, then one to two to four which is counted   as a different uphill path, you could do one to  three or one to three to four and we also have to   include the path of length one which is just the  valley. We could have placed these numbers in the   squares in any the order it didn't have to be in  this pattern it could have gone one, two, three,   four, and in fact you should play around with it  now trying Nordic squares of different sizes and   different arrangements of the numbers to see what  kinds of patterns might give you this smallest   number of total uphill paths. If you want to try  it for yourself now is your time to pause. I don't   think it's possible for a human to look at this  and know the solution instantly you really do have   to play around and try a bunch of random things to  see if any patterns pop out. Here's an example of   a three by three Nordic Square which looks like  it might have the minimum paths for the size.   Again we only have one Valley and it is our one  but we could also have one two, one two six, one   two seven, one three, one three eight, one three  seven, one five, one five nine, one five six,   one four, one four eight, one four nine, giving a  total of thirteen path. You can easily find other   three by three arrangements that have more paths  than this and the crux of the solution comes down   from recognizing two things; one is that for the  minimum number of paths you want there to be only   one valley, like we have here our number one. And  you also need to recognize that for every pair of   adjacent numbers there needs to be only one path  back to the valley. Say here two and six are   adjacent numbers and from them there was only one  path back down to our valley which was like this. This essentially means you can have two  paths coming to a common endpoint but you   can never have this two paths converging  and then continuing on because that would   mean that from wherever this ends these  two adjacent numbers will have multiple   paths back down to the valley so this is not  allowed and we can call something like this,   our end point, a sink. With this we've  given ourselves the golden ticket,   if every two adjacent numbers only have one path  back to the valley then the minimum number of   paths in our Nordic Square would be equal to the  number of adjacent numbers. Counting the pairs   of adjacent numbers here we have one, two, in  each row and there's three rows so that would be   two times three, that's six adjacent  numbers that are adjacent horizontally. Plus   one, two, three, four, five, six, going vertically  so there'd be 12 pairs of adjacent numbers.   That's our number of minimum paths but we have to  remember to add one because our valley, our single   valley will also be its own single length path  giving 13 as the total minimum paths in this case. To extend that for any number of n we will always  have n minus 1 adjacent numbers on each row times   n rows and n minus 1 adjacent numbers on each  column times the number of columns so that's   2 n times n minus 1 paths plus one to account for  our valley. That's basically the answer here and   that will give you some points but more points  will come from new harder task of describing how   to arrange the numbers to get this minimum and  that part of the solution comes from describing   a tree-like structure like this. If you put the  numbers in increasing order in these trees you end   up with only one Valley and for every adjacent  pair of numbers only one path back to it. The   black numbers here represent numbers higher  than 153 so you fill in all these ones first   and then go back and fill in the black spaces  with the higher ones. These black numbers are   sinks and there might be multiple ways to end  at the sink but there'll never be a way out. Let's give the problem to ChatGPT and see what  it does. The 2022 problem I picked wouldn't be   in the GPT training data set. I'm using the latest  model GPT-4 and it gives me the wrong answer. It   says that the smallest number of paths is equal to  n. I asked it how many valleys there should be to   minimize the number of paths and it immediately  gives a square with three valleys. It says that   the minimum is n valleys which gives n paths and  if you remember recognizing that there must only   be one valley is key to getting the solution. I  asked it to make a Nordic square with only one   valley which it did but then it failed to count  more than one path and when I asked it to do it   for larger squares it gave constructions that for  each adjacent pair there were multiple paths back   to the valley and it just couldn't count  paths correctly even when I tried to point   out mistakes. I don't think it would be able to  score any points on this question at the moment. This recent paper from Microsoft analyzed the  abilities of GPT-4 and said there were sparks   of artificial general intelligence which  refers to AI that is broadly intelligent   at or above human level and not just good at  one specific task. They mentioned that ChatGPT   is trained on a massive amount of webtext  data and uses at its core a self-supervised   objective of predicting the next word in a  partial sentence. The paper also shows that   GPT-4 does not have the capacity required to  conduct mathematical research. It does manage   to solve one IMO question in here but in many  other problems it lacks critical reasoning and   the ability to examine each step of its  arguments. They say one reason for this   might be that in most of the math solutions  it's read on the internet, most just give the   correct answer but don't capture the thinking  process. It doesn't make guesses or backtrack   because it can't backtrack. When predicting  the next word it just needs to move forward   in a straight line. In essence it's bad at math  because it lacks the ability to play around. So ChatGPT fails for now but it's not the  only competitor. The people at OpenAI,   in fact some of the same people who came up with  the IMO Grand Challenge in the first place, work   on a different AI system. One that according to  this paper published in 2022 has already managed   to solve some IMO problems. It's not a language  model but instead is a proof-solving model that   speaks the language of formal math. Automated  proof solvers have existed for a little while   but this one is trained to iteratively search for  new proofs. It takes mathematical ideas that need   to be proved and breaks them down into smaller and  smaller statements that are easier to prove until   there's nothing left to prove. This is what the  formal math language looks like, it's not too easy   for a human to read but it's probably our best  bet when creating a system that is actually useful   to mathematicians. It uses something called the  lean theorem prover. The proofs that it creates   are machine checkable, following a series of  logical laws. They say that their model and search   procedure are capable of producing proofs that  chain multiple non-trivial reasoning steps and it   does require some creativity to invent lines like  this. Combining an AI that can speak formal math   and correctly prove statements with something like  ChatGPT to make it more user-friendly could really   make some waves in math and much more likely  to pass the IMO Grand Challenge than a language   model alone. And as I mentioned while ChatGPT is  still struggling with IMO problems it has already   passed a bunch of exams. There's a note here back  in the Microsoft paper saying that GPT-4 does not   rely on memorizing the exact problem statement  but on applying a general solution method. While   it is possible that GPT-4 memorizes the solution  template this is not necessarily a flaw as it is   also a common way of solving math problems for  humans. They're saying that GPT-4 has been able   to pass a bunch of exams by doing what many of us  do which is kind of memorizing the structure of   common problems. Looking through past papers and  solutions can often give you enough of what you   need to do well in an exam because the structures  are usually pretty much the same and if this is   what ChatGPT is so good at too then perhaps  this is the space in which exams will change   a bit. Perhaps for now exams will have to change  to be a little bit more like the IMO to reward   creative problem solving and necessitate having  to stop and play around with the problem. That   seems like it is still a uniquely human trait but  I'm not sure for how long that will be the case. Thanks for watching this video and  thanks to my Patreon supporters for   making it possible. A special shout out to  today's Patron Cat of the Day, Cathulhu. [Music]
Info
Channel: Tibees
Views: 266,720
Rating: undefined out of 5
Keywords: tibees, tibbees, tibee, tibes, tibbee, tibees asmr, tibees exam, exams, IMO, international mathematics olympiad, math olympiad, terence tao, chatgpt, AI, neural networks, math AI, imo problems
Id: Fi1e-B60cok
Channel Id: undefined
Length: 14min 2sec (842 seconds)
Published: Thu May 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.