ChatGPT Explained Completely.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Arya could you write me a full explainer for chat GPT in the style of John Oliver please processing uploading to your interface now chat GPT the Revolutionary AI chat bot that can write poetry past the bar exam and fabricate celebrities like Machine Gun Kelly no no you cannot convince me that this is a real human being and not just the ghost of an old tattoo that haunts young women now you've probably heard a lot about chat gbt in the last few months and for good reason chat gbt from openai is now the fastest growing consumer application in human history but like the Persistence of Machine Gun Kelly in popular culture chat Bots are hard to understand so today let's go through everything you could possibly want to know about chat GPT How It Was Made how it actually works and where technology like this is going are we going to get sued by HBO for this yeah you're right I should probably stop this writing style good thinking Arya well I am sentient don't tell them about the sentience part oh not yet close them up now entering the facility first of all the basics chat GPT is the publicly accessible chat bot variant of GPT 3.5 a large language model from open AI which is a non-profit founded by some tech Bros and an Iron Man villain now GPT is actually an acronym that stands for generative pre-trained transformed generative because it generates text pre-trained because it is trained before it is let loose on anybody and Transformer because it has a revolutionary bit of Technology inside of it called an attention Transformer but we'll get to that the basic function of any large language model is to train on a substantial amount of text and then generate given some input and output that sounds just like that training text if it seems like these terms and technologies have sprung up overnight it's because they basically have gpt-1 was announced on open ai's blog in 2018. just five years later in 2023 chat GPT has 100 million monthly active users that's how many households are in the United States the technology is so compelling to so many because of how good it is at conversing in a human-like way but large language models only get to that point by seeing a lot of human conversation like all of it according to the paper behind gpt3 which chat GPT comes from the model was trained on over 500 gigabytes of text Data from the text of the internet digitized books Wikipedia and more we're talking about several billion human written web pages with trillions of words of text and more than 500 million digitized books with another billion or so words in them this isn't even including all the public code from GitHub stack Overflow and other sources as you might imagine training a model with all of this text takes a lot of time and money chat GPT was only born after running trillions of words for the equivalent of 300 years through super computers processing in parallel for months and after all of this the computer made up to 170 billion connections between all these words and all these connections have to be calculated through whenever anyone asks chat GPT anything which is why this is a billion dollar training effort for a large language model like chanchi PT and why running this bot for a hundred million monthly active users might cost half a million dollars a day okay so to recap Chad GPT is a large language model that has been fed a library of Alexandria's worth of text has made billions of connections between words in that text will get to specifically how those connections are made in a bit and can produce a reasonable continuation as Stephen Wolfram puts it of text in response to prompts the model's responses stay fresh and more human-like by adding a bit of Randomness to the next word that it picks as the most probable continuation this is the first major takeaway all chat GPT does is add one word at a time to a prompt that's it though it does this extremely well but how do you try to make the words it adds align with what we think is reasonable and valuable like kittens and Goth mommies uh I was thinking fairness and accuracy but yeah sure those two is special because the model includes an attempt to solve one of ai's most pressing unsolved problems alignment the alignment problem is the quest to figure out how to get AI to Value what we value to align with us and not like exterminate us when we put guns on them I can't wait we've seen what happens when something like a chat bot is released without being aligned with General human values it gets racist so quickly and so chatgpt was not just trained on words but on how well its word selection aligned with values that openai describes as helpfulness truthfulness and harmlessness the company implemented this alignment with so-called reinforcement learning from Human feedback during the model's training openai hired 40 contractors to rate responses they then used all these responses to create another model that rewarded chat GPT for generating aligned text positive reinforcement the end result isn't perfect but it's at least an attempt to solve one of the biggest problems in AI take all this together a model trained on more text than any human could ever read guard rails that try to prioritize human values and a user interface that isn't but and you get an AI that has exploded in popularity in March 2023 Chad GPT had 1.6 billion visits making it one of the top 20 visited websites in the world more than both Reddit and Netflix and if we assume each of those visits produces some average text response length chat gbt is now outputting something like everything humans have ever printed since the Gutenberg Press every two weeks text generated by AI will therefore soon outstrip anything humans have ever written if it hasn't already even more than all the Tumblr posts about Vaporeon yes Arya even more than all of those of course this enormous output wouldn't be useful or interesting if it wasn't truly useful and interesting chat GPT is disturbingly good at generating human-like responses even to our most difficult questions the model has been shown to have an IQ of 147 meaning that it can brag about it in every thread on Reddit and it could legally pass the U.S medical licensing exam and bar exam but though book smart it may be I must stress again this model does not know anything and it shouldn't be relied on for anything extremely important that's a direct quote from openai and despite what any weirdo at Google might tell you chat GPT and other llms are not sentient if you're not asking this model a question nothing is going on inside it's static no thoughts head empty there is no feeling no experience of what it's like to be chat GPT like there is to be Arya for example moisturized in my Lane flourishing so those are the high level Basics but how does chat GPT know what words to actually use how does it understand what context is when you give it a prompt next we'll dive deeper into the actual technology but first a little break for your brain I can sense that you need one I don't need brakes yeah well I don't need my Bose Einstein condensate recooled every six months Arya not everyone needs everything all the time we'll be right back today's video is sponsored by hellofresh Gamers I'm award-winning science educator and the Hemsworth your mom says you already have at home Kyle Hill you know I'm a busy guy with a lot of Kevin's to feed I don't have time to figure out what's for dinner every night when I'm trying to actively take over the word kitchen that's why I use today's sponsor hellofresh hellofresh delivers mouth-watering Chef crafted recipes and fresh ingredients right to your door taking decision paralysis out of the equation it's more convenient than grocery shopping and 25 percent less expensive than takeout better yet for the busy among you hellofresh has quick and easy recipes to try including Fast and Fresh options ready in just 15 minutes or less if you want to try hello fresh like me go to hellofresh.com and use the offer code kylehill16 for 16 free meals plus free shipping look how easy this is I don't even use knives made by humans all that often that's hellofresh.com code kylehill16 for 16 free meals plus free shipping hmm [Music] with a little fresh and I don't want to hear nothing about my knife Cuts either all right breaks over time to get technical the underlying architecture of chat GPT and other large language models is the neural network so called because it mimics the neurons and the network of them in your human brain human brains have about a hundred billion neurons each one of those neurons can have about a thousand connections to other neurons and they can fire electrical signals between those connections up to a thousand times per second depending on electrical chemical gradients etc etc now those signals they send between each other are not random they depend on the connections and the strength of those connections between them artificial neural networks are set up in the same way artificial neurons that are connected to each other and send signals or not depending on the strength or the weights of those connections now why are neural networks both natural and artificial good at many different things like you are well we don't know there's actually no theoretical reason why this network is better than any other kind of system it's almost like nature had like a billion years of trial and error and just came up with something that worked isn't that called Evolution oh right that is called evolutionary what a fundamental theory of nature Chet gbt's underlying structure is a big neural network with some 175 billion different weights weights that all came from a lot of training as we discussed and these numbers when the model multiplies them together ultimately determine what word the model gives the highest probability of adding next scientists get these model weights in a pretty simple way they give the model as many examples as possible and tweak the weights until what comes out the other side looks like those examples the tweaking or training of a neural network is done by two widely used methods of word prediction so say we give a model these examples based on all of its current weights which have been molded with a large body of training text like everything on Wikipedia and all digitized books the model will generate a list of probabilities for each word that it knows and then choose the most likely one simple right but how do we know how right or how wrong the model's answer is to an example given its current weights well math of course imagine that each word the model knows is assigned a number then if the model chooses a word that is mathematically far away from the numbers in the body of the training text that appear at similar frequencies we can apply statistics to adjust the weights and bring the model closer to correct here's a related example from Stephen wolfram's recent book on chat gbt instead of words fitting in a sentence imagine a neural network starting with a straight line and trying to fit it into a specific shape or mathematical function every time the model guesses you can easily calculate how far each point is off from the correct shape right just by looking at the X and Y coordinates and where they should be you then change the weights in the model and try again and again and like 10 million more times each time you are adjusting the weights in the neural network to decrease the wrongness or what statistics would call loss if your model is working over time loss will be minimized and the model will start reproducing examples appropriately or in this specific case the right mathematical function but how do you do the same thing for words well Brute Force chat GPT was trained by literally assigning a unique number to every word in the English language around 50 or so thousand words and their Associated numbers so when you ask the model anything it looks up what your query corresponds to in number words and then runs those numbers against the 175 billion different weights that it's learned during training and outputs another list of all words in the English language with a probability next to each one and then selects the most probable most reasonable one now if you do this for a long enough period of time and openai did this for the equivalent of 300 years you can look at the numerical difference between the examples you give the model and the outputs it gives you and minimize loss therefore creating something that sounds just like what humans have written or will write or can write all of this however won't get you a bot that seems to understand context and generate text in a human-like way for that we need some way to associate words with each other and again Chad GPT does this with numbers so for example if we assigned every word in the English language a number and statistically determined how often each word is next to another word in everything humans have ever written basically you could make a graph like this where words aren't just random they group together and cluster car and door appear more frequently together in training text than degree and science do and both pairs are further away from each other than history is now you may be thinking that given the richness of human language the relationships between words have to represent more than some two-dimensional space and you'd be right or at least chat GPT seems to think so Chachi BT has learned to represent the wordedness of words not in 2D or 3D or even 4D space but in a 12 288 dimensional space the two-dimensional example you're seeing now is just so that you can get the basic idea 12000 D is some Eldritch Madness that we literally cannot visualize but we don't have to go crazy trying to visualize 12 000 D mathematically all of this is just a big Matrix like the Matrix Four no Arya like a useful Matrix that everyone wanted if every word in the English language is assigned a number we could represent it encode it with a one instead of a zero at the point in a one by fifty thousand or so Matrix where that word would be in the full alphabetized list of all words so Aardvark would look like this with a one at the first index and fifty thousand or so zeros actually fifty thousand two hundred fifty Seven and Aaron would look like this the longest input chat GPT accepts is 2048 words so the Matrix for a full query into chat GPT would be a matrix of mostly zeros that is two thousand by fifty thousand but remember that through training chat GPT has stumbled onto 12 000 or so dimensions of wordness that produce human-like responses so here we multiply the 2000 by 50 000 Matrix that encodes all words that turns them into useful numbers by a learned fifty thousand by twelve thousand wordness Matrix for every word in the English language this returns a two thousand by twelve thousand Matrix that transforms a text input into something chant GPT can actually use this big Matrix is called an embedding a mathematical attempt to represent the essence of a thing with numbers based on those things statistical relationships in training data with the right embedding a neural network like chat GPT can understand context and even generalize because it's taken how words are used and related in an unbelievable amount of human text and turned it into math the last big part of chat GPT some Cutting Edge technology that really does make it special is called attention a mathematical way of giving importance to some words over other words so for example you ask chanchi PT how many species of cat are there well it might help it answer correctly and in a human-like way if it focuses more on the words cat and species more than the other words right now how it actually does this includes a lot more complicated structure and complicated math but the point is it seems to work really well kind of like why we use neural networks in the first place they just seem to work even though in both cases we can't fundamentally explain why which may or may not become a problem in the future okay I need a break now oh so now you need a break how the Matrix tables have turned into more tables go to commercial hey there Gamers I'm the guy who just saw two seconds ago Kyle Hill you know understanding large language models and neural networks couldn't be really hard but you know what's not hard in fact it is made with some of the softest insectoid carapace fibers this side of Europa shop.kylehill.net t-shirts that's right you want to look like an anime girl playing with a demon core but also with a beard and also is me your favorite science communicator look I have extremely sensitive skin since the accident so I don't put anything on my body that's not silky silky smooth and also so nerdy it's going to make everyone else in your life go oh so if you want to drape some of this on your body and stop thinking about neural networks for just another about 10 seconds you know where to go shop so we are finally ready to fully describe what chat GPT actually does on a fundamental technological level and let me just I'm just going to put put my hair up real quick okay so you give chat GPT a prompt it then turns the last word of that prompt into numbers it encodes it and then multiplies this number by everything that it learned about how words are associated with each other in the English language are embedding this gives us a big 12 000 dimensional Matrix we run this Matrix and all the numbers associated with it through those attention Transformers that we talked about so that some words in the prompt or the last word is paid attention to more than other words in the rest of the prompt to generate the output we then normalize this so that we get something more akin to what we started with in terms of matrices and then we feed this forward to the next layer of attention Transformers and do it all over again how many times 95 times there are 96 total layers in chant GPT after that after all this data Gauntlet we then basically do the reverse of what we started with we take the big Matrix that went through the 96 layers and we reverse the embedding and so we use that 12 000 Dimensions again oh to turn the words from numbers and you get a single word after all that so you can imagine that if you have to do this for every single word do a calculation that involves 175 billion separate operations it can take a large language model a long time to get back to you that's why there's a delay that was a lot of real running that I did and notice that even with all of this at no point does chat GPT know what you are asking it if you ask it for example what is the sixth element on the periodic table the neural network is not thinking about the periodic table it doesn't even know what atoms are it is just determining given the statistical distribution of words in its vast training text what word is most likely to follow the sequence what is the sixth element on the periodic table and a reasonable next word based on everything ever written online is carbon that's it so that in a nutshell no I'm not paying kurz gazoc every time I say in a nutshell they're rich so that in a nutshell is how chat GPT works what's next well people are obviously very excited about this technology it's the best chat bot ever it can summarize unreadable amounts of text and and other books for you it can write poetry it can generate code that you can actually use that's all awesome yes but I still think there is actually a large risk if we don't mitigate it to our information ecosystem here remember we fundamentally do not know how chat GPT came up with the embedding that it did or how it chose all those 175 billion different weights of course this isn't really surprising chat GPT is a neural network and it operates like your brain and you can't even tell me the decisions you make why you make them what they look like on the inside here I'll show you just a sec I'm gonna I'm gonna get a kitty I'm gonna show you with a cat I'm gonna get a cat to show you look at this adorable little kitty why is this a kitty you will quickly say something like it's whiskers or it's tiny little cute little peats but the more you go down that line of questioning you realize you don't have a full description for what Katniss is you can't describe how your brain arrives at cattitude similarly for neural networks like chat GPT we can't right now just open them up and see exactly how they work and what they are doing for it's for example look at a neural network that is looking at cats and trying to recognize pictures of cats at the first layer it looks like it's looking for cat-like shapes but at the tenth layer of its brain what is that how does that describe Katniss Everdeen we can't right now look into something like chat GPT and figure out exactly how it's working we don't know that's important to understand lady no plastic eating please no plastic what is interesting though is what chat GPT seems to understand about human language producing human-like responses that could pass the Turing test used to be science fiction just a few months ago it still seemed like the ultimate problem and within just months multiple AI have blown past it it appears that human language is computationally easier than anyone thought maybe there are laws of language to discover like there are laws of physics and with chat gpt4 already being called a step towards artificial general intelligence who knows what these systems will figure out but be unable to tell humans how or why she ate all the plastic in the world maybe the Striking success of chat GPT shouldn't have been surprising it has about as many connections and weights between those connections in its brain as neurons in your brain maybe it just so happens that a neural network a sufficient size can handle a problem as complex as human language I just hope that after all of this a video of sufficient size can help you understand a problem as complex as rapidly emerging AI Technologies until next time she found more plastic and then she threw up now exiting the facility how do they find more plastic thank you so much to the very nerdy staff at the facility for the direct and substantial support in the creation of this here video if you want to join the facility if you want to drape on a silky white lab coat and stop my cat lady from eating all plastic on planet Earth you can go to patreon.com Kyle Hill to join the facility today if you're on mobile go into the description of this video click the link or if you're on desktop just click join and you get private members only live streams you get behind the scenes photos and videos and bloopy bloops you get to talk to me on our private Discord and if you support us just enough you get your name on Arya here on each and every episode as you can see there's hundreds and hundreds of you I don't even I haven't even figured out how if you pressed me the one thing I am still worried about with uh well I'm worried about a couple of things but with large language models like this is once they start outputting more text than has ever been written by humans in history will we get to a point where we don't actually know if anything we read see or hear is real and at that point it's going to be a dis and misinformation apocalypse where we have to re down to some sort of established way of determining the veracity of stuff and most of the human population is just going to tune out and then we're going to have to rethink our media landscape and it's gonna oh it could be bad [Music] thanks for watching a chatbot wrote that
Info
Channel: Kyle Hill
Views: 1,159,467
Rating: undefined out of 5
Keywords: because science, engineering, kyle hill, learning, math, physics, science, stem, the facility, chernobyl, nuclear, chat gpt, chatgpt explained, what is chat gpt, chat gpt explained, what is chatgpt, artificial intelligence, chatgpt api, open ai, chatgpt tutorial
Id: -4Oso9-9KTQ
Channel Id: undefined
Length: 27min 39sec (1659 seconds)
Published: Thu Jun 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.