Stephen Wolfram Answers Live Questions About ChatGPT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

AI has come a long way since mistaking him for a plunger.

👍︎︎ 1 👤︎︎ u/muchcharles 📅︎︎ Jan 26 2023 🗫︎ replies
Captions
okay here's a question various questions about chat GPT that still seems to be in the news we're still doing things with it it's um uh you know the the big story there tends to be that chat GPT is very good at writing stuff and the stuff might be fact it might be fiction um the uh the there's a the sort of great thing that can happen of chat asking well from alpha you know questions which Wolfram Alpha can then really factually respond to and that becomes an interesting way to kind of fact uh fact enhance what GPT might otherwise be writing about that otherwise might be complete nonsense but there's a question here how does chat GPT work all right I can I can tell you something about that so first of all what is chat GPT it's a well these days it's kind of a website made by a outfit called open AI um that uh you can type text into and you can have a conversation you can chat with it you can ask it um uh questions you can ask it to write essays for you and it does a remarkably good job of writing very human sounding essays I think one of the first tests I did on it was write a persuasive essay I said uh for why wolves are The Bluest kinds of animals I'm not sure what I was thinking of typing that in but the result was quite interesting because it wrote something that started off by saying people don't usually think wolves are blue but in fact there is a species of wolf that lives in the Tibetan Plateau that is blue and its name is this and it it's it's blueness comes from the um uh the same mechanism that leads to Butterfly Wings being different colors and so on and it went on and on and on from a bunch of paragraphs and it's been kind of taught to make essays like like some high school kids are taught to make them where there's a paragraph at the end that says in conclusion comma Etc et cetera Etc well it's the things it wrote sounded really quite convincing they were convincing enough that I was kind of looking up on the web is it is it actually true is there is there in fact a blue kind of wolf it was all complete nonsense but it sounded very convincing and it had all the right structure to be an essay about blue wolves and so on how does it work well what what fundamentally it's doing is it's making plausible English text and what it's doing is you give it a prompt a piece of initial text and it's trying to continue that text in a plausible way so if you say something like um uh well in a very minimal level if it was doing that character by character in English if you put a cue you know it's followed by you if you put a th well there's a good chance it's followed by an e at a level of letters we know the statistics the chances that different letters follow each other in in something like English but now we can generalize that we can make it bigger I was just looking if you look at um nouns and adjectives for example uh oh gosh I was just doing this and I'm already forgetting the results um but you can um uh you can say things like if you look at typical English uh you know you could say AI is okay that's a you know uh and then what's the word that comes next you know maybe the next word is most commonly very or uh you know or or might just be you know an for an interesting technology or something like this but you can ask the question what is the typical word that follows AI is blah and so how do you define typical well you know there are many words in principle in the grammar of English that could follow but there are particular words that we humans have typically used when we actually write essays and write text about things and so for example in the case of chat GPT it took a a crawl of the web so the web has I don't know how many it is these days it's a few tens of billions of pages that are reasonably human written that exist on the web not not that many billions um and uh there are many more if you say how many different possible URLs are there there are trillions of URLs that get used but the number of really human written I've got a lump of text some human wrote this the number of pages is is not that different from the number of humans um not that we were all writing one page some of us have written a lot of pages and other people haven't been in the business of writing web pages but it's you know tens of billions of pages is that exist on the web and one of the things you can crawl the web you just start from one page and you follow all the links on that page you follow all the links from that page and chances are you'll eventually get to sort of all the pages on the web that's a good guess at least and so for example there's a thing called common crawl which is a project where people have just collected have gone and done that crawling and have you know visited all these websites and and repeatedly do so you can get that it's a big big lump of stuff you can download it from uh from various websites um and that's that is a a copy of the text content of the web it's a little bit more complicated than that because a typical modern website doesn't just have text on it it has a bunch of JavaScript and so on on it so you typically I think you typically download things or walk files which are some kind of web archive formats and it all has to be unpacked and you have to decide what you really mean by the text and so on but let's assume you've got big blobs of texts that came from the web so you've got that you've also got books there are maybe some number of tens of millions of books that have been written there are um there are maybe five million books that have been uh scanned maybe maybe a little bit more than that now um and you can scan the books you can do optical character recognition you can pick out the actual text and them so that's another big Corpus material that you can feed in and chat GPT was trained on common crawl plus a bunch of books and so when it says what word typically follows this it knows that based on having seen all of those different parts of you know the web and these books and so on it's like in those things and those things that humans wrote what is the typical thing that is the next word okay so that's where it kind of starts but this whole question about kind of what is a typical piece of English it's more than just saying give given that I had the word is or the word you know I'm going to have the word cat what's the word that comes before cat might it be black or white um you know it's more than just word by word kind of statistics of what's uh what's happening more than just the chance of what word follows which are the word it's dealing with things on a larger scale where it's saying when we look at this whole flow of this sentence or many sentences together um what what is the typical way that we would put these things together to to form something which is sort of typical of what you would see out there on the web and so on and in fact in track GPT I think in its current form it's really dealing with things that are length well 2048 tokens a token is roughly a word but sometimes there's more than one token per word but that's the kind of block that it's trying to say I'm trying to make these things of this length that are typical of what is sort of seen out there in on on the web and by the time you've given a prompt the the things that can follow given that you say I want to do this and this and this the things that can follow are things that are determined in many respects by The Prompt that you gave so that's kind of the big picture of what what it's trying to do it's trying to mimic the sort of typical stuff that it's seen that humans have written out there on the web and in books and so on and it is completely remarkable as far as I'm concerned that anything like that is as good at producing sort of interesting essay type text or poetry or whatever else as it is it's remarkable it's something that I think we is really a big clue in terms of the way that we should think about processes about human thinking and the nature of meaning and text and and all this kind of thing nobody knew that this was going to work this well and in fact sort of the precursors to modern chat gbt didn't work as nearly this well now there are there are lots of things so so in terms of how does it actually work that that's the big picture of what it's trying to do mechanically how does it do that I can tell you something about that but uh just to talk a little bit more about sort of the construction of what's going on when we make an English sentence we are partly we do it according to the grammatical rules that exist in English like we might say uh you know um the uh what's a good example my gosh um the uh the crocodile ate the what do crocodiles eat the fish um and uh the um and then you know that has a as a structure of an English sentence the crocodile that's a noun phrase eight that's a verb the fish that's another noun phrase um and so we know that English is set up grammatically in as something that says that that has the form of things like noun a subject noun followed by a verb followed by an object noun and so on that is the grammatical structure now of course we can have sentences that mean absolutely nothing we say um the moon you know fished the elephant or something it doesn't mean anything what does that mean it's so we can have even more meaningless sentences we can say uh uh I don't know the we can say things which are which where the where the categories of thing that we put together don't really fit in addition to things that don't make sense because they're not sort of physically possible they're also things where the category of thing isn't really right you could say something like uh you know the house eight the lettuce well houses don't eat things and it's sort of or you could say the um uh you know the chair was happy chairs don't have feelings so far as We Know um it's uh there there are all these ways that sentences can be grammatically correct in terms of their syntax in terms of you know the parts of speech and so on but they're not semantically correct they don't have meanings that we can immediately identify now you know in if you're writing a poem it's a little different a chair could be happy and a poem because poems are a more elaborate kind of more abstracted form of communication than ordinary text but this question of what fits semantically together is an interesting question that we really know very little about and chat GPT is kind of showing us that there is probably a way of thinking about sort of what fits semantically together just like there's a way of thinking about what fits syntactically together with parts of speech and nouns and verbs and and grammars and so on and and so that there's you know I think it's a it's a kind of a very intro interesting kind of wake-up call for the analysis of kind of how meanings get constructed and how how that works it's interesting that you know I've been interested in this stuff for a long time and I I've actually been sort of shocked recently I I you know what's been studied about this and I I've certainly known things that have been studied in in the last few hundred years and I kind of recently realized that most of what's talked about actually goes back to Aristotle from more than two thousand years ago talking about kind of the way that you construct things with meaning and he had a very sort of primitive way of thinking about that he also had a very different view of kind of how the world is put together and how one can talk about things using science and so on and it's time for a rewrite a reboot of that and that's a project I'm I'm hoping to do fairly soon um but in any case let's come back to the mechanics of how track GPT works okay so it's it's got text as a prompt it's trying to continue that text in a statistically reasonable way okay how does it actually do that well the uh the thing it's doing is it's training a neural network to uh to figure out how to do that let me explain how that roughly works so a neural net is kind of modeled on the theory that we've had for the last well kind of 100 years or so about how brains work that brains have these neurons and neurons can be either like uh that they they can be you know active or not active electrically doing something or not doing something every millisecond every every thousand thousand times a second roughly neurons are like oh am I am I am I going to fire and generate an electrical impulse or not and a typical neuron might have a thousand connections to other neurons that'll little dendrites little little uh tentacles that sort of stick out in these nerve cells and in our brains we have maybe 100 billion nerve cells in human brains um and uh each one might have a thousand connections to other nerve cells and so the the idea is that when we think what's happening is that there is electrical activity one nerve cell it's kind of spreading its activity to other nerve cells spreading to other ones and that's producing this kind of whole pattern of electrical activity in our brains that corresponds to our process of thinking and we think that memory has to do with the way that there are the connections between nerve cells the synapses the connections between the the the the the den the the things that stick out of the bodies of the nerve cells the ways that those can be uh whether those connections exist whether they don't exist whether they're strong connections whether weak connections we think that memory and learning has to do with the making of those connections there are different ideas and detail about how those connections get strengthened and made and what produces that that process of learning and roughly you know we we have sort of a short-term memory that works for a few minutes and then gradually over the course of a day or so will form a long-term memory and there's there's actual protein synthesis that happens and so on there um there's even you know very recent theories about how there are kind of dark synapses that have been kind of lying around waiting to be activated um and uh and then sort of come up come come to be active when we when we form memories and so on there's there's some uncertainty about how that all works but roughly the idea is that all these all these nerve cells all these neurons they have lots of connections to lots of other nerve cells and there's something to do with the the weights of the connections that encodes the memories and when we think when we use our brains what's happening is that neurons activity from one neuron is spreading to others through these kind of connections through these synapses and so on and so artificial neural Nets are modeled on the same kind of idea except then instead of being actual nerve cells there's just uh things in computer memory um where it's uh we just have or in woven language it's just a uh just some piece of some some uh expression in more from language that represents um every uh sort of every one of these nerve cells and represents sort of the value of that nerve cell and then there are weights by which this nerve cell is kind of how much effect does one nerve cell have on another nerve cell so you end up with these um uh with these big I mean mathematically it's matrices or tensors which say I've got this whole Vector of values this whole sequence of values and I'm going to determine from that a new sequence of values and roughly what ends up happening is you add up you say this is the input sequence of values I'm going to take some sum I'm going to weight those values in different ways I'm going to add them all up and then there's kind of a thing where in addition to just adding the numbers up you have some kind of threshold the most common thing that's done in current neural Nets I think or relu which is actually a very simple thing it's just if the the weights where you add up all these numbers can be either positive or negative if it's if the resulting added up thing is negative it's zero otherwise it's just the value that you got that's a simple way to to do this sort of thresholding effect um in in actual brains it probably works a bit differently but that's kind of the basic idea so the way it works is in an artificial neural lesson probably in an actual neural net in our brains they're the succession of layers and you're kind of you're saying oh we we give this collection of activity to the first layer then it does this transformation it produces a collection of activity for the next layer keeps on going layer by layer up through the through the system and eventually gets to the output now a question is uh and okay and so then the issue is well okay what should all those weights be that go from the input to the output and so then you want to construct different kinds of functions that you can compute so for example one thing might be on the first layer you're giving a bunch of values that correspond to pixel values from an image and what you want is it goes through the network many layers you know 50 layers or something and out at the end you should have something where in the output there's two thousand possibilities um that correspond to sort of a thing with two thousand possible values and one of those values let's say is going to be big and most of the other going to be small the big and those two thousand things you should say they correspond to two thousand kinds of objects that might exist in the world cats and dogs and elephants and chairs and tables and so on each one of those things corresponds to one of those objects and what you want is a is a process where you give a bunch of pixels in you go through 50 layers and then at the end the last layer has as output something where most of the values are close to zero except there's a high value for a few things and it's like okay that corresponds to a picture of a cat so what you're trying to do is you're trying to in that particular case for a neuron that's trying to identify images what you're trying to do is say it starts off with a bunch of pixel values red green blue is this for this pixel you know 10 million pixel values or a million pixel values or whatever for an image and then it's like go through these 50 layers going through all of these things where you've got all these weights and so on you might have 10 million weights or something you know in different arranged between these different layers and eventually out at the other end you're going to get something where you have a sequence of values and where they're mostly zero except for the ones that correspond to the things that it thought it saw okay so the big question now is how do you set up all those weights so that the neural net actually behaves this way and so that is where you have to train the neural net how do you train the neural net well you might start up by just having completely random weights and then you feed in an image of a cat and it says oh that's half elephant and half turtle or something and it's like oh that's not right okay how do you improve it well you say that's not right that's pretty far away from what is right there's a what's usually called a loss function it's like how far away from being right are you and what you try to do is you say okay you're pretty far away from being right but if you just change those values in this direction you'd be getting closer to being right and so then what you have to do is say this is what we know in the output you've got an elephant and a turtle but you really just wanted it to be a cat so that means the elephant and turtle values are big but you really wanted the cat value to be big so you can think about that as the numbers for elephant and turtles should be get smaller the numbers for cat should get bigger that provides you kind of a way to say these are the these are the way I should change the numbers in the output so now the question is well how do you change the weights and the inside of the neural net to uh to get you closer to the numbers that you want in the output okay so this is a giant application of calculus um and it's it's sort of interesting because there are many things in the world where sort of calculus uh became less and less useful and and gave way to a lot of things that were much more about programs and so on this is one place where Calculus is still really useful because what happens is you're you're making use of well maybe I don't want to go into the technical details of this but it's in the end it's making use of the chain rule in calculus operating on you know in calculus you might study a function of you know one variable two variables three variables in neural Nets we're dealing with functions of millions of variables um and what you're trying to do is you're trying to say okay you want to back propagate you want to propagate backwards from from the way you want to change the output what do you want to do to these weights in the middle so that if you were fed that input again you'll be closer to getting the output you want so what happens is you bash these neural Nets really hard you do you know trillion times you do this process of saying hey what does this input give us output um and oh that's not quite right it has you need to push it in this Direction Let's propagate back from how we needed to push it let's get these weights changed in the middle and you keep doing that you do it over and over and over again and eventually the thing learns in the sense that the weights that are arranged in the middle actually do do the thing you want so the most common form of training well in in this setup is so-called supervised training where you say here's a picture it's a cat here's a picture it's a dog and you do that in typical image identification training uh in the early days of this I won't use like 25 million images of a total of maybe 5 000 kinds of objects and so what you do is you keep on saying oh here's this here's the picture of a cat um uh you know you should say it's a cat oh you're saying it's an elephant push in this direction Etc et cetera saying just keep doing that over and over again nobody knew how hard it would be to do this there's a big Discovery in 2011 that it was possible to do this people didn't know it you know it was a month of CPU time somebody left something just running and at the end of it it turned out it had managed to learn a bunch of things nobody knew that was going to work nobody knew how hard it would be to sort of bash the neural net to be able to learn these things but in any case that worked and the thing that that happens so so there are many many tricks in how you actually do this training and you know for example you might not have enough images you could synthetically make images say well this is a cat but we want to add to the cat image something where we just change the pixels around a bit and it's still a cat and then we use that as another piece of the training you know in some places where people are for example let's say training self-driving cars to recognize different kinds of objects and so on there's a certain amount of data you get from just driving cars around or having your fleet of cars from some car manufacturer that's kind of going around and seeing what it sees on the streets and getting all that data back and using that to do more training but in the end there might not be enough training data and in the end I think what's been done there is people make essentially video games for the cars to drive in artificially um and that's kind of the way that so they're they're kind of driving around millions and millions of hours of um of sort of synthesized uh experience on roads and and so that but but that's a typical example of kind of this this sort of mechanism for training where you say this is what I want to to this is what I'm going to feed in this is what I want to get out now learn in the middle of the neural Nets to do that okay so in the case of of chat GPT the thing we want to do is we want to say um here's a piece of English text it's a big long piece of I don't know you know you got it out of off some random web page or it came from Shakespeare or whatever else it was it's a piece of English text now you say Let Me Show You neural net the first 10 words of that text and I want your the neural Nets to arrange itself so that it will correctly tell me the 11th word the 12th word Etc et cetera Etc okay so that's that's the task that you're training the thing on so what you want to do is kind of bash the neural net you want to run that enough times with enough different pieces of text and and sort of bash them on that hard enough that it will eventually learn correctly to predict and you can say to the neural that oh you got the wrong word there you know let's poke you in this direction and um uh and then uh then you'll get a different you know that then you'll you'll get closer to the word you should have got based on the training of the actual text that you're you're shown now the the the whole point the thing that's exciting with something like chat GPT or with so-called generative AI in general is that it is often being given inputs that's never seen before let me give the example of a cat and dog identifier first so let's say you show it a very specific photograph of a cat then you could imagine you're going to train it to say okay that specific photograph of a cat it's a cat but now let's say you modify that cat a bit the cat has been uh you know I don't know licking its whiskers or something or whatever whatever the cat has its ear in a different orientation or the cat is on a different background or the cat is uh you know put its pore up in some way you still want to recognize it as a cat and the big point about these neural Nets and so on is that after you've trained in enough of cat images it kind of gets the idea of roughly what a cat image is and so even if the cat had its pore up it can still say oh that's a cat okay so the the point with chat GPT is that it's saying if there's a particular piece of English then or text and it could know yes that is exactly how it should continue it's a famous you know to be or not to be that is the question you know whether it is noble and whatever you know the the Shakespeare speech right it could know exactly what the next word is in that Shakespeare speech but what if you gave it something that I've never seen before it's gonna it's gonna try and generate something that is reasonable to generate but it's never seen that particular input before and the thing it's going to generate is something that's never been seen before as well so I will say by the way one thing uh that is important in chat GPT is that sometimes it's like I've seen it before I know exactly how to continue it turns out that that doesn't produce very good results so what's done in chat GPT is it has what's called the temperature where it knows the most likely way that this is going to continue is blah but actually it knows the top five ways it might continue and instead of always picking the most likely way to continue it it has a little Randomness that's sometimes picking the less likely way to continue and so that's that's a an important piece of the sort of the creativity of something like chat GPT I think it's parameters I think it's set to 0.8 um sort of the the um the the I'm certain I'm going to pick the top thing will be one so it has a little bit of uncertainty there and that's important and if you give it no uncertainty at least the earlier versions of things like the the earlier um GPT systems if you give it no uncertainty if you say always pick the Top Choice it very often gets in a loop and it'll just keep saying the same thing the same sequence of words over and over again the thing that's totally bizarre is that in the end what chat GPT is doing is it's adding one word at a time basically even just sometimes one token one piece of a word at a time and it's producing this essay that seems like oh it makes perfect sense the whole essay makes sense but yet the way it actually worked was just it's adding one word at a time and that's really telling us something about sort of the structure of of text and and things like this well okay how does it actually do the adding one word at a time the uh uh okay let's see um the thing there is an idea that things called Transformer Nets okay so the the thing I was telling you about for example an image um when you have this image and you've got these sort of artificial neurons and they're connected to other artificial neurons you're kind of flowing through layer by layer um typically for an image the the sort of the neurons are the artificial neurons are arranged in kind of an array that corresponds to the the geometry let's say two-dimensional geometry of the image um those are usually often called convenience convolutional neural networks that are arranged in this kind of geometrical way actually very similar to these cellular automata that I've long studied um but any case the the way that works is um it has these layers of neural of artificial neurons you go from one to the next and so on okay so there's a slightly different idea that's used when you're generating sequences when when there's a as I say for for dealing with images you you start off with your first layers of of artificial neurons are arranged in a grid just like the pixels in the image actually it's very similar to our visual cortex our primary visual cortex at the back of our brains also has neurons that are arranged in a way that somehow corresponds to the way that the cells on a retina are arranged in in that kind of geometrical structure okay but but so in when you're generating a sequence the the idea is that you'll have something where essentially the thing you feed into the neural net is something that is going to be you're trying to add to the sequence and what you're doing is you're looking back in the previous things that are already in the sequence and you're taking the things that are already in the sequence and you're saying okay which numbered things in that sequence should I look at to feed into my neural Nets that works out what the next thing should be so the the typical thing this so-called attention mechanism is you're you're trying to add the next word so you're going to look back and you're going to say the word five ago that was important that was a verb or something that was important the verb the word 15 ago that was also important that was some subject of a sentence or something like that so one of the things you try to do is to learn sort of which word is worth looking at in these in in the kind of preceding part of the text that's kind of the attention mechanism then what you do is what you're basically doing is given a set of words you're figuring out which words are important then you're feeding those words into something which is very much the same kind of neural net that I just described where you're just saying there's this uh that there's you're going to feed it in and then it's going to produce okay your okay I got to explain a couple of other things to to make this make sense okay first point is you are uh what starts off as a bunch of words the cat sat on the mat okay what neural Nets deal with are numbers basically uh collections of numbers and you know is it 5.7 is it 5.8 Etc et cetera et cetera so how do you turn the cats out on the map into a bunch of numbers so in a first approximation you just say Okay V is word number one cat is word number 714 sat is word numbers such and such you essentially just make up numbers for words okay turns out that the thing that is really useful to do is to make up not just individual numbers for words but for every word you have a whole collection of numbers I think in chat GPT it is 16 000 numbers per word I'm not sure of that order it used to be maybe a thousand numbers per word but it's the the these are okay so what that is is for every word you're saying I'm going to characterize this word by saying it's 0.7 2.2 4.6 minus 0.8 Etc et cetera et cetera thousands of numbers okay what do those numbers mean well those numbers are some kind of way of of saying what the meaning of the word is let me give an example so let's say you're going to let's take an example not with words let's say it's with flags flags of the world or something you want to arrange the flags of the world on a piece of paper so that flags that are similar looking will be in the same place on a piece of paper so I don't know the um the French flag and the Italian flag they I think they're similar um they you know they'll be in a similar place the um uh the uh maybe the um oh I'm I'm doing badly here on my Flags but um uh you know some country that has a big green flag is is really in a different place from you know the US which has a you know red white and blue flag things like this and the you know the the the the flag of the you know the U.S flag might be a bit similar to the UK flag um I just you know how what which one is visually similar to what so you're trying to arrange these on a piece of paper according to visual similarity and so what you can then say is if you've done that successfully actually we could we could forget Flags we just do letters of the alphabet so lowercase letters they Alpha get printed like an a is similar to a g a v is similar to a y things like that we arrange those in similar places on the on the page and more from language there's a function called feature space plot that does exactly this that will take a bunch of things and try and arrange them according to their features in some ways so that things which have similar features are nearby Okay so so the result of that is that you can take anything like it's a v it's a y it's a it's an American flag whatever it is and you can place it somewhere in this feature space and it'll have some coordinates it'll be at some position it'll be you know x coordinate this y coordinate that and so the point is that you try and arrange it so that the things that are that are that are similar in some sense have nearby coordinates they appear nearby in this feature space plot now in in the real case you're doing that not in two Dimensions but in many thousands of Dimensions but anyway the result is you're trying to get these things where every word is encoded by this array of numbers so that words that have similar meaning are encoded with similar collections of numbers that's usually called an embedding word embedding it's sort of the way that you embed words in let's say thousand-dimensional space um and those embeddings those learned embeddings that you you deduce from looking at so so the way you learn these embeddings is to for example say uh this word appears uh you know okay so words like for colors for example might be somewhat nearby in the embedding and you can tell that because if you look at tons of text you can substitute red for blue in different places so there's a sort of a framework for the sentence where could be a red could be a blue there we kind of know that those things red and blue have to be somewhat similar because they get they get arranged in the same places in in sentences um and so that sort of tells us something about their similarity of meaning you could swap crocodile for alligator because they appear in similar sentences and so crocodile and alligator will be near in meaning space so that's the way that you deduce these embeddings these mappings from words to collections of thousand numbers and so on those embeddings are valuable things that's actually something that people nowadays sell embeddings um because they what is it good of having an embedding well it means that if you've got a sentence for example and well in the end you can do these embeddings not just for words but for sentences and you say I want to do a search I want to find is there somewhere in my documents where I have a sentence that's like the sentence that I have now well if you could just take every sentence in the document and turn it into this Vector of numbers this collection of numbers and say are these numbers close then that will tell you do you have a sentence somewhere in a document that's close to the one that you're looking for okay so there are a bunch of tricks in in the case of chat GPT um there are a bunch of tricks that are used so for example you can say well the words come in a sequence um uh you know um the cat sat on the mat uh it's kind of inconvenient to have those words be actually arranged the cat sat on the mat rather there's a positional encoding so that you basically include with the V you include in that big Vector you include information that says oh that was the first word the next one that was the second word it's kind of a weird little positional tag that gets included so that allows you to take all those words and basically put them in a big vat they're just a bunch of words bunch of vectors that correspond to these words just put them in this vat then you use this attention mechanism to decide which of these words you are going to sort of arrange in what order and how much weight you're going to put on each one then you feed that into this neural net then you get out the sort of prediction for the next word now okay of course it's actually more complicated than this there are like the typical thing that's happening is for every word you're going through many iterations of this attention feed forward it's called feed forward when you when you have this um this thing which just sort of feeds through a neural network you're doing I think it's 16 layers maybe maybe it's more than that in chat GPT um some number of tens of iterations of that process now that's what what happens when you're trying to when you're running chat GPT and it's trying to figure out the next word when you're training it you have to do sort of the whole thing kind of in reverse and you have to kind of feed it you have to say well what do you think you know GPT what do you think the next word is oh no no you're wrong let's tell you a different word instead okay so that that's that's the main part of the training is that there's an additional part of the training which is tricky and which is actually one of the things that really I think has has separated chat GPT from the previous generations of so-called large language models things which are sort of models of of how text is generated um the uh but let me uh well okay so so the the first step is sort of the the main training of the thing so I think in chat GPT it's it's um using an underlying language model that has about 175 billion parameters so over the course of you know CPU years GPU years more more to the point um you are sort of gradually refining those 175 billion parameters to represent the statistical characteristics of all of those billions of pages on the web and millions of of books and all this kind of thing those 175 billion parameters encode the kind of the information of what is typical text like that you see out there on the web and so on now that's to be compared and our brains maybe we have a few trillion synapses between our neurons so the number of parameters in in chat GPT is not that different from the number in a brain although it knows about a lot of stuff that is much more obscure than even people like me know about um you know who have decent human memories um I think the the and there are a lot of tricky issues about um you know is it really does it really need uh precise numbers 0.2785644 et cetera et cetera et cetera that really doesn't matter um I think I don't know how many chat apt is using but it's probably just using about 255 levels um for each number rather than having a precise number that can be any number of digits Okay so uh that that's the basic thing is there's this whole attention mechanism and you're you're you know it's trying to continue the text okay it's another important piece which is so-called reinforcement training reinforcement learning applied to chat gbt so people ran these chat Bots and it's like well what's the chatbot going to produce well sometimes a chatbot produces crazy stuff it's off yakking about you know just gone completely Bonkers it's it's uh it's it's off the deep end in some way um and so what was done was that people were told you know have a chat with this chat bot and rate it tell it well you're off the deep end is it going in the wrong direction oh no no that's great that's bad that's great Etc et cetera et cetera so groups of people actually did this and turns out that's a very powerful it turns out it's again not so obvious that that's a powerful thing for directing what the output will really look like I think there's probably also a certain amount of templating that's been done for particular kinds of you know make a poem make a make an essay things like that that's sort of just engineering on top of on top of other things just to make a a better user experience for the system but the thing that that is sort of important is that you have these groups of people and they're like telling it do this do that and that's then fed into this um uh uh that that's that's fed into the kind of training of the neural net there's a there's an additional layer that is this reinforcement training that kind of keeps it more or less on track that seems to be important people can obviously can obviously be very controversial to say well you know you've had some group of people that are training it that are really keen on you know I don't know what the the critical character of uh you know why I don't know that are really keen on on this feature of the world or that feature of the world or or their they're really big on promoting the you know the rights of cats over dogs or whatever else it is and so that set of people that did that final training independent of of just feeding in the big pile of stuff from the web the people who did the final training have a lot more control over it's gonna like cats not dogs um and so that's sort of an issue and there's there's kind of a bunch of questions about how that should be done and what this new generation of kind of AI Wranglers um you know how one sort of gets the right groups of people and how one sort of stops things in in the right way there but that's a that's sort of a different issue but but that's the that's the basic idea so I think I I think I more or less went through kind of how how chat GPT works and what it's always doing is it's generating a word at a time sometimes it's sub word and you can see that it makes up words because it has tokens that are not quite a whole word and so it can end up gluing together those two tokens to make up a word that's never been seen before and so it is uh uh yeah so so it's doing this everything it's doing it's taking the prompt the original prompt it's saying what's the next token what's the next word it's giving you that it's taking the prompt plus the token feeding that back as as another thing and saying okay now continue that and it keeps doing that over and over again until you get your complete object out and I think as I say it can it can go about 2048 steps doing that that that's the distance back that it's kind of a tension system looks um to make to keep coherence in the text so you know if it was talking at Blue Wolves At the beginning if you ask it to go too long it'll be talking about uh you know orange you know Elephants or something at the end and it will have completely lost the thread of what it was talking about because it only has sort of a finite look back um that's that's how far the training could be it could be made to work um that's uh okay well anyway so that was a basic explanation
Info
Channel: Wolfram
Views: 78,635
Rating: undefined out of 5
Keywords:
Id: zLnhg9kir3Q
Channel Id: undefined
Length: 47min 17sec (2837 seconds)
Published: Tue Jan 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.