Ch(e)at GPT? - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I thought we'd talk about chat GPT no one's been talking about that right it's not been mentioned I think it is both equal parts valuable and overhyped and that's the best kind of AI right I'm not going to talk about how it's trained today well I've done a great video on how it's trained you've done a video before on gpt3 which is broadly based off and oh that's bad no no no no no oh oh that's really bad Philip Moriarty has also done a video on the 60 symbols Channel about if it's possible to cheat with chat GPT and so I suppose what I'm interested in talking about is is it possible to detect cheaters in chat GPT and how easy is that to do I've been looking at this really interesting paper by uh John kirchenbauer and colleagues at the University of Maryland on if you were trying to change the output of a large language model like chat GPT or any of the others in some subtle way that allowed it to be detected as AI generated rather than human generated how might that look right and I think that's a really interesting question not does this system exist can it exist I just think if it was to exist what would be a good way of Designing it right I suppose you might think that the most obvious way would be would just try another neural network right that's what we always do so why don't we train a large language model to detect the output of large language models and the answer is firstly that's very inefficient um difficult to do but also someone will release a slightly different trained large language model although refine refine the language model to some other slightly different task or even the same task again train it for a bit longer the output will be subtly different and then it won't work right and another problem is that for some tasks there is only one answer so if I say to an AI write me a piece of code to iterate over a list it may do that and a student would probably give me the exact same answer in which case labeling that student's answer as a I like would be incorrect right so it's not obvious how you would solve this problem by just training another Network so this paper comes at it in an entirely different way what we want to try and do is subtly change the output to avoid certain words and by doing so we can detect that that's happened with a higher probability right so we can say there is very little chance that the lack of words is based on chance alone this was generated using an AI and then there's other questions you've got to ask like how do you do this in a way that doesn't make it so that the text is unreadable and or not very good right so that's what this paper is about and that's why it's really really interesting why don't we first recap very very briefly how a large language model that's trained on next word prediction which is to say quite a lot of them uh that's what it does right and the answer is will you give it a prompt right so you you can give it your prompt yourself if you actually go on the website but there were also prompts that happen automatically as part of apis or and there's also extra prompt that goes in before you even start typing right to try and tell it who it is for example right you're you're a um a large language model trained by open AI I think is the the text but I keep seeing over and over again and I'm getting a bit bored of so a large language model takes some text so this is going to be T T1 T2 TT three two four that's your prompt and then we want to know what is the next token in the sentence we can then iterate this process so we can say to the large language model what's the next word in the sentence it will do it and then we can say okay now given T naught I didn't write a naught there you go given T naught one two three four and now T5 you just gave me what's T6 and we can repeat this process to generate very long streams of of text right and this works really nicely as as we've seen right is there any way we can kind of inject ourselves into this process to influence what it does well the way that this generates text is it doesn't actually tell you what the next word is it essentially tells you how likely it is right so you might have a sentence that says something like I man up the right and then for every word it could produce it tells you how likely it is to come next right so a few examples right Hill right a slope I guess stairs stairs yep and then it perhaps are slightly out there football right which doesn't make any sense as a sentence right so this the likelihood of Hill I would say is fairly fairly likely so that would have a high likelihood slope may be a slightly less likelihood they should be the same width uh stairs maybe a bit less and football was just a teeny likelihood because no one's written that sentence ever because it's stupid right the procedure for using language model would then generate one of these words based on their likelihoods right now there's different strategies you could do for doing this but you could imagine picking them with a chance that's related to How likely they are right so you're this likely to choose Hill and a little bit less likely to slope and almost impossible that you choose football and then the variations on this theme how could we change the network to show that it was generated by an AI well the first thing we could do is we could just start injecting words like football where they don't belong right and so we could say well why am I up the football and everyone go it was definitely generated by AI it's also not usable in any sense right so we don't want to do that what we're going to do is we're going to generate a red list of words that we mustn't use at any given time time step so we take the previous word in a sentence that was generated the and we use that to seed a random number generator which splits all of the words into red or green right so let's say Hill is red slope is green I could actually use red and green for this can you imagine it's never going to catch on right so this is Red Hill is red always this is this is going to work um slope is I've just worn a green circle around as well all right you're gonna leave that in the edit aren't you all right Hills green right and we'll say no about it slope is red stairs is red and football is green right it's a terrible example because Hill's already the most likely there could be many hums of words we could use in that sentence right and discounting them completely so this is four examples I thought up this is a very long list of possible words and what we're going to do is we're going to reduce the chance that we pick slope and stairs right or indeed increase the chance that we're going to pick these green words right and what that would is it makes football slightly more likely but luckily we also made Hills slightly more likely and then the process is unchanged all we've done is dissuade it from using the red words and promote the idea of using these green words and then we've chosen a random word like before so maybe we choose Hill so we're going to write Hill in so then we're going to do the next token in the sentence so we put in hill we calculate a hash and we see our random number generate and that re-partitions our vocabulary into a new set of red and green right and I was going to write and after this so I ran up the hill and then something happened but and has now unfortunately been put on the Red List Right sort of chance of picking and has been reduced maybe then has been lifted up so maybe then is a green word and so on right so we're going to subtly influence which words get picked now if you do this then when you're verifying this later you can come back and say well given that we were the what was the red list ingredients we can recompute it by seeding the same random number generator we can calculate the hill screen given now that Hill is the next word we can see that Ben is green but then we can see that we produce and then we is a red word and then went maybe is a green word and we can count the number of red and green words that we achieved now given that we're randomly splitting our data set into two there's roughly a 50 chance for any given word is going to be partitioned into red or green but if we're dissuading the network subtly from producing red words there's going to be a lot more greens in here than there are red right and so over a paragraph of text or even a few a sentence or two of text if you've got mostly green that is extremely strong evidence that you were running through an AI that had this running on it right where we subtly influence the output so I'm now I think you already store things like kind of that random number seed or how do you get to that so what we wouldn't want to do is split the data set once and for all into red and green and then stick with it right because then you would have a really important word like football does get used in sentences as red and we'd be dissuaded from using it and then you couldn't ask your AI about football at all right which I don't but you know you could this is why this seeding of the of the random function makes so much sense because it produces a new red green list every time you move to the next word in a sentence and so this is just stored transitly very quickly right it doesn't take long to Hash this it doesn't take long to see the random number generator and it doesn't take long to determine whether a word is in the red or green list so this adds a negligible amount of run time to your already pretty significant computational resource of running the language model at all but when we come to actually analyze the results we can look at this for each one we can say okay what was the red and green list at that time is this red or green what was the red and green list at the time is this red or green and we can count up the number of Reds and greens and then we can perform a statistical test that says if this was a real sentence that where it was 50 each red and green but we've actually got this amount what are the chances that this was by chance right and the Chance is vanishingly low even for short sentences so it's not quite as simple as this right which is why there's about 25 pages in this paper and not one the problem we have is sometimes the the next word is incredibly obvious right so the example from the paper is when the first word is Barack right very likely the second word is going to be Obama you happen to pick a red word at that time Obama was classified as red you might not use it and then you've got Barrack something else and it and it's just not going to read well as a sentence right so the key is you dissuade it from using these things but you don't completely discount them you don't this is not like where you're restricted from using any word on a red list you're just subtly reduced so that means a situation where you had word a word b and word c all of which had equal likelihood then what will happen is this word is on the red list this word will probably not get picked because this will come down and these two will stay high if you had a situation like here where Hill is already pretty likely even if it got red listed it will probably still pick Hill and what they call this is is sort of a high and low entropy sentences so a sentence is where the next word is so obvious that you can't really safely change it without it being really really ruining the output and so you don't right you just dissuade it from using um the red words but it has no effect if it's overwhelmingly likely to pick a specific word but in a situation where choosing between then and didn't really make any difference in the quality of a centers we produced then yes you can dissuade it and that makes a subtle difference on the number of green and reds that we see in the output now the paper goes into a lot more detail for example it's possible to seed our pseudo-random number generation off the previous few words instead of one um and there were various attacks you could launch on this so it was an interesting one on Twitter where someone had um chat GPT produce an essay where it put Emojis between every word now that means that the Emojis are going to generate the same hash which could generate the exact same green and red distinctions in which case you've got yourself a problem because you can delete them and and then you've got random assignments right so there's lots of interesting attacks on this but in principle I think it could work right which I think is quite interesting right now you know the implementation issues you've got to convince people like open AI who run these models to implement a system like this you've got to convince everyone else to do it right because if there's an equally good language model that doesn't do this people will use this right but if a situation we're in at the moment is but there are only a few language models that are really capable of doing what gbt3 and chat GPT can do and so in which case if those companies got on board you might have a system like this where it would allow you to make use of these methods which are really useful for having dialogue and chatting with and learning things without actually passing off their output as your homework right which really at that point you're not learning anything necessarily at all right in the long term I think what we'll probably end up doing is not worrying quite so much about whether an essay is generated this way we'll be asking different things of students maybe working with the AI or whatever the AI looks like in five months time depending on how fast it's going right and you know you might see it's a more collaborate collaborative thing and it's just a tool that we use but at the moment we're in that kind of slightly odd position between it's become a tool that we use and we know how to use it and it's messing around with all our exams right and that's so that's the that's the place we're in could you suppose you wanted to cheat could you generate an essay but had that had a system like this and then change sufficient words the answer is that you would need to change a lot of words because you've got to go from almost no red words to about 50 red words and you might as well write it well yeah at that point it is it is starting to become easier just to learn the material and write the essay right um so you know you have to make that choice oh that's really bad the very first line is nonsense anytime you can get more likely to get approval by deceiving the person you're talking to that's better um and this is a thing that actually did happen
Info
Channel: Computerphile
Views: 523,083
Rating: undefined out of 5
Keywords: computers, computerphile, computer, science
Id: XZJc1p6RE78
Channel Id: undefined
Length: 13min 51sec (831 seconds)
Published: Thu Feb 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.