Outsmarting Chat GPT 4 - can it do math?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody so a lot of people have been talking about chat GPT this week I will be no different we are going to be putting chat GPT for the latest uh iteration uh just in text version we're going to be putting it through its Paces with some mathematical questions so of course computers are very very good at Raw computation they've been very good at that for a while that's not what we're going to be testing today although I do do just want to demonstrate that chat GPT is very very good at arithmetic in natural language so if I ask it to do something like this um what is the sum of the first let's say 3 30 odd numbers first three odd numbers let's see if it understands what I mean find some of the first 30 odd numbers we can use a formula for the sum of an arithmetic series The summon our arithmetics series can be calculated using the formula blah blah blah blah for odd numbers the first number is one that's good the last number is the 30th odd number um they get some of the first uh 30 odd numbers is 900 so it's very very good at doing things like that what I'm going to be doing today though is be putting chat b23 gp24 through its Paces with some slightly harder reasoning problems and these are ones where I've basically been trying to catch gptl and let's see if a few days later if we can still catch it out so the first one we're going to look at is testing whether some numbers are prime so let me give it this one is the sum of two prime numbers ever Prime yeah so it's still giving the wrong response ah so last time actually no it's it's no um last time I did this it did actually say no there are no exceptions it seems to have improved itself so yeah it's caught the example where two plus something else can also be a prime with twin Prime so this is an improvement actually when I tried this last week uh it gave me a different answer exactly why it's giving me a different answer now I'm not sure I'm not sure exactly how this thing is learning but it is that's pretty good all right let's see if some of the other tests so this is a this is another another good example so I give chat gbt a really big odd number to the power of another really big odd number and maybe just to make it even more extreme I'll do that to the power of a really big odd number okay actually it doesn't even need to be hard does it um sorry that should be odd there we go so I take an odd number to a massive power and then which is itself to a massive power and then I'm going to add and 70 into it and I'm going to ask is this number Prime well a human will look at this and think okay I'll never be able to compute this this number is just incredibly big however what can I know about this number well it is an integer to the power of other integers so it must be an integer and it is an odd number to the power of an integer therefore it must be odd now if you add 17 to this it will be an odd number plus an odd number which is an even number therefore it can't possibly be prime let's have a look if it can work this out so to determine if the given number is prime we need to check it has any devices otherwise other than one itself however the given number is extremely large okay um ah here we go when an odd number is raised to any power the result will be an odd number if you add an odd number in this case to an even number very interesting so it almost got there except it made this very very small mistake here when you add an odd number in this case the result of the exponentiation to an even number 17 17 is not an even number however it has got the correct result there will be an even number so this is a real Improvement let's see if we can test it a little bit further and we'll give it something some odd number to the power of an odd number to the power of an odd number so this number these numbers are just going to get really really big one two three four yeah Plus now let's just try this again but with seven so let's see if it can extend the same reasoning even when the numbers themselves probably become even even bigger is let's see if I can ask it is it composite is analyzing the expression both the left and right parts of the expression are odd numbers raised to odd Powers very good as a rule when you add two odd numbers the result is always an even number since all even numbers except very good okay so this is better than it was the other day the other day it completely failed um okay so far it's uh it's doing doing really really quite well and doing better than it was the other day let's look at something to do with prime no that's not to do with prime numbers so we're going to talk about something called Friedman numbers Friedman numbers are a type of number where you can reconstruct the number from its own digits so for example let me just type this 126 equals uh what is it equals 6 times 21 so using the one the two and the six you can create a mathematical expression um that gets you back to the original number so that's a Friedman number let's see okay let me take a number that I know is quite large um so let's take 5 to the power of 15. okay let's take this number I know this is a Friedman number by the way I'm not going to explain why but I know it is is this a Friedman number so it knows what a Friedman number is that's no surprise search the internet yeah this is very interesting after analyzing the giving number it becomes clear that it's not a Freedman number so it's a very very human way of putting it it shouldn't be um clear that it's not a Friedman number the reasoning here is not correct it says there's no obvious way to create an expression using the digits in the numerator that would be this hmm okay let's try something else now let's try um let's try 6 to the power of 16 what's that number okay that's a little bit too big let's try to fix it right this number also should be clearly a freedom number I won't go into detail why is this a Friedman number same kind of reasoning yep it really doesn't know how to approach this problem at all um if you do a prime factorization of this you'll find that it's actually just six to the power of 15 and then you can use the digit six um ah so it has come up with something but I don't think that's quite right sorry where where was I um yep so this one thinks it's got an expression but actually it hasn't used all of the digits but it's under the impression that it has so it's worked out it's the Freeman number but not for the right reasons and also I think if you were to check this expression here you would find that yeah it's not anywhere it's not anywhere close anywhere close what it says it is so it can't even do basic maths in this case so it's created this very very confident answer there is a total failure um now there's another challenge like like this which is called the four fours challenge yeah so let's see if it knows the four channel four fours challenge do you know the four fours challenge yes the full force challenge is a classic mathematical puzzle that involves using exactly four fours and a set of basic arithmetic operations addition subtraction multiplication division exponentiation roots and factorials along with parentheses sometimes Square uh decimal points by the way to represent different numbers use your integers the goal is to create expressions for as many consecutive integers as possible and now give some examples good so it's given a pretty good summary of this it's given it a little bit too confidently in the sense that sometimes people allow you to use the recurring sound sometimes people say you should use a decimal point um there's no one for four fours challenge but it seems to get the basic idea okay um so let's try the same idea but with six sixes create a list of the first 50 positive integers using only six sixes and here it seems to fail tremendously every time I've tried it I don't know why it's just hung up I say I think I used all of my I'm not let's just give it a moment okay here we go right let's try and regenerate the response here we go the six six challenges use exactly six six and it's set basic arithmetic right as you can see it's really starting to go quite wrong here and sort of introduce the five here hasn't quite got the idea um let's just try and check the arithmetics as one times um yeah the arithmetic on this doesn't even check out uh in fact maybe maybe none of these check out this is a really interesting example it seems to be trying to copy some sort of pattern that I've seen before maybe in the four fours challenge but it's completely failed completely failed to to get the gist of it and now it's giving me a load of uh a lot of random stuff yeah none of these none of these even make make sense I don't know I can't bother to check them up but I think we can probably say that it can't get these uh the six six hits this is quite interesting this is something that human beings I think are still far superior at is playing with numbers being created with numbers raw computation it's got us but yeah these playing around numbers Chachi PT can't do yeah um this is just embarrassing now I want to stop stop generating now um I want to give it another problem and this will be the last problem I give it today and this is an interesting problem in the sense that um it's a bit like an Olympia problem I think I got this one off Twitter let's let's give it here so it's gonna be nine M plus 16. equals a and 16 M plus 9 equals B and I'm going to tell it that A and B are both perfect squares let's see what are the possible the values of M possible possible positive integer values of M now I suggest if you're mathematically inclined that you stop a new uh the video here and you try to find out some of the solutions the interesting thing is that there's two quite obvious Solutions uh the hint there's user Pythagorean triple um let's just go for integer values actually um but then there's a third much much higher number which you can't really arrive at Via inspection so I wonder if chat GPT will be able to use both types of reasoning let's have a look so it's understood it's got a system of equations it rearranges them okay so it's just doing some algebraic rearrangement here which is which is fine nice good this is this is reason quite well foreign three lots that's really good okay this is really impressive because the other day I gave it a similar problem and it couldn't do that and by the way it is for some reason it stopped I think because I just reached the the character limit but this is on track to solve it if you do 4 squared minus 16 you get zero so it's found all of them the interesting thing is that there are two solutions which you can find by inspection which is just uh m equals one and the last one which hasn't been generated m equals zero but then you have this third one m equals 52 which can't really be reached just by inspection that makes you wonder if zero and one are the only two solutions the last time I tried a problem like this it failed this time it really seems to have got onto the right track straight away um and that is beautifully beautifully worked out now let's try and give it just something really quite abstract like even a problem that I don't know and we'll leave it at that so what is the biggest number that can't be written as let's say the sum of the the cube Square the sum of an integer it's Square and it's keep some odd differences let's see if we can solve this one hmm yeah so it seems to have laid out a mathematical research program it's testing these right and there you go so it's a laid out a mathematical research program it's begun it it found um it found one number that couldn't be written for uh in in a particular way using a particular sum or difference of of an integer square and a cube it hasn't quite got that right and then it's just decided um that that must be the largest number so it's it's mistaken the idea of the largest number it is found with the largest number that exists so I think that's probably a good place to stop today um that's just a good example of putting chat GPT through its paces um I think it's done pretty well I think this is this is impressive stuff the main thing that that concerns me is when it fails it seems to do so so completely unaware that it's that it's failed and it fails in ways that are very very hard to predict it's very hard to know when it's going to get something and when it's going to have difficulty with it um and having a look at all this algebra stuff it makes it seem like it has an exceptionally good grasp of arithmetic and yet in this one for some reason it thinks one times six minus one minus one it thinks that one times four is is one it's just stupid anyway um I hope you've enjoyed me putting through uh chat GPT through its Paces I think I'm gonna do more and more of these I'm gonna go and explain some of the solutions myself sometimes and then I'm going to save chat GPT solves it the same way bye bye for now
Info
Channel: Eternity In An Hour
Views: 12,593
Rating: undefined out of 5
Keywords:
Id: TvK391Eq22k
Channel Id: undefined
Length: 22min 48sec (1368 seconds)
Published: Mon Mar 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.