What Can Huge Neural Networks do?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone to an exhibition of gptj a 6 billion parameter transformer model from a collaboration between benwang or king of lowell's on github kamatsuzaki and elia there a i there is so much that we can do with just this one model that sometimes it may be confusing just what a transformer model is actually doing or what things like view shot learning even means so for the very first example we'll take a very high level approach through what's going on and then we'll run through a variety of examples that i think you'll find extremely impressive and rather shocking that it all runs from the same model for this high-level approach i'll be obscuring away various bits of code just to make things more clear to understand the order of operations but i will put links in the description for everything you could possibly need depending on how deep you want to go towards understanding how these models work actually running the model as well as a text-based version and of the run through of all of these examples we'll start with the context of transformer models are quickly advancing the field of artificial intelligence now neural networks do not work in string data they work in array data so the first step that we're going to take here is to convert this into an array and we do that with what's called a tokenizer so we can come down here we can say tokens is equal to tokenizer dot in code and again the tokenizer has already been loaded here and this will be the starting context then we can go ahead and just print out and display these tokens and we can see here this is the array representation of this string the next step in our pipeline is going to be to pad this input vector this is because the input layer to our neural network is of fixed size in this case that's 2048 or 2048 sequence length of tokens can be input into the network but in this case we have much less than that so what we do when that is the case is we pad the front of that vector so for example we can output padded tokens and we can see it starts with a bunch of zeros and then ends with uh our actual vector the next step is essentially to pass everything through the model there are some parameters here the first one you'd probably want to take a look at is the generative length which is specifying how many tokens that you want the model to actually generate otherwise we also have temperature top p and top k these are all parameters that you can tune to adjust the variability and outputs but for now we'll keep them as the defaults and we will get our actual output we get various information from the output variable but the actual output tokens that we're after can be found here these are indeed all of our tokens but they're in a slightly strange shape this is just simply due to how they're generated for now we'll go ahead and store these into these this output samples variable and we can check the shape which is a 1 by 128 by 1. what we really want to do is be able to reference these tokens in sequential form and since this is a numpy array we can use some numpy magic and boom we have our actual sequence of tokens since this is an array of arrays because it's a batch even if it's of one we can iterate and then de-tokenize the sequence and what we get is that continued generative output we can then shorten this a little bit add some color and we can see our full output here in yellow we've got the original input and in cyan that's the generated output so feel free to pause the video and give that a read it's pretty exceptional i seriously doubt anybody would be able to tag this as being written by an ai unless they already knew so not only does this model appear to have a grasp on the english language and grammar and all that it also appears to have a decent grasp on just the subject of deep learning giving a pretty darn good summarization of what it is now we can certainly quibble about how much of this might be memorization but i invite you to for now hold those quibbles because we're really just getting started on the subject matter that this model knows and the things that it can produce for example one of the types of data that this model was trained on was github in stack exchange which of course includes programming we can try to get the model to write a regular expression for us by adding a comment in the code that suggests that the thing that's going to come next is indeed a regular expression that will find the dollars as we can see when the model finally finishes it did indeed output the rest of the code for us and it produces what i think is mimicking stack exchange where someone is likely asking a question and then there's maybe an answer and some more information we can then just copy the code part paste it into an editor run it and we can see that it indeed did parse out the dollars the string formatting included an extra dollar sign so we could either fix that by either removing the dollars you know the backslash dollar sign from the regular expression or from the string that we printed at either way it worked pretty well one other thing to note is that since there are some degrees of variability we can actually run this exact same input prompt again and get a different response again in this case let's just grab the regular expression copy it over to the editor print it out and again the different output from the model but the same thing happened it worked it gave us a valid regular expression okay so regular expressions with python code works pretty well very cool but there's still a whole lot more here let's try another programming task opencv using a very similar method we'll import cv2 specify a file name and then nothing else we'll just slap in a comment that says hey we'd like to open the image and find the edges again we get some output it continues actually writing more code than we initially cared about at least for now maybe you're interested in that the other code and doing those things next but for now we're just going to take the edge detection only copy and paste that into the editor in this case the code saved the output as an image rather than doing something like cv2.m show but that's fine we can pull those up and we can see here's the edge photo and the original full color photo it indeed did find the edges using kanye edge detection and again all we did was just import cv2 specify a file name and said here's what we want to do how do i do that okay how about something with a few more lines of context and a few more lines required to complete the objective in this case we'll start with some common tensorflow and keras imports and then we'll load in some data set that we have and then again we'll use a comment to suggest the following code should be a three layer convenient with a 64 by 64 imagery with five classes i tried to pick sizes in classes that are a little more rare so something like 28 by 28 and 10 classes is going to be super common for example 64 by 64 might be common i don't know five classes is pretty rare i tried looking for image data sets with five classes i couldn't really find many so hopefully that's good what we get in return is the entire code for a convolutional neural network including training and testing again we'll copy the code and run it and sure enough it's fully functional in training it looks like it's ready for testing data too what happens though if we change our mind and we actually want to have a two layer convolutional neural network and seven classes again i think seven classes is fairly rare for the model to get this right it would really need to understand where one might actually make a change i.e the final dense layer to account for some specific class count in this case we get a completely different output including even the styling of how the network code is written and how the network is actually built but again we get completely valid code it is indeed a two layer confident this one even specifies the 64 by 64 shape the previous code was also correct it was arguably better it was actually a bit more of a dynamic handling for the input image shape than just hard coding the shape but this is what we asked for too again we can just copy and paste this into an editor run it and indeed it also works this model is actually training and is a totally valid code to the spec that we asked for while this is really cool it's not just python code that it can do for example we can open up with a html and body tag and then make a comment about an upcoming button that will return a function when clicked gpdj goes ahead and adds some paragraph text that says if you click the button your browser will be taken over okie doke no skynet vibes or anything like that and it indeed adds some code for a button which does come with an on click handler gbdj also ends up closing off the body and html tags so this is actually a complete html file so we can copy paste and view it for ourselves pretty uneventful since the takeover function doesn't actually exist but so far gptj has done everything we've asked of it so how about we add another comment that says here's a takeover function it will send an alert that states the browser now belongs to us on the first run there's a bunch of stuff here it looks like it's valid code in some way but i'm not sure it's doing what we actually want it to and i really want and expect something simpler so we'll just run it again and this looks much much better so copy and paste that again refresh the page and let's check it out beautiful does exactly what we wanted while these are extremely impressive examples and we could keep going with programming this model's not a programming model it's really more like a general purpose language model that just so happens to be able to also write code alongside all of the other things it does gpdj was trained on a data set called the pile and here you can see a tree map of the contents of that data set sorted by color showing if the data is academic style from the internet pros dialogue or something else entirely then within those categories we can see the actual source of the data so actually stack exchange and github the source that contains programming information they aren't even the majority of what this model has learned from this model also has knowledge of things like medicine law mathematics general conversations and much more scrolling down a bit we can see a table of how the data set in this model was weighted and for how many epochs it was trained a massive area of research is available just in coming up with balancing a data set like they've done here and getting the results like they've done here so what else can we do we can ask generic questions like who invented calculus and just see what the response continues with from there but we can also structure these prompts in such a way so as to encourage the model to follow our structure so for example we can use q colon and a colon in this case where we ask about voltage for a standard us home outlet i also limited the model's response to just be up to the next new line but we can see that indeed the answer is correct we can also let the model continue for longer our original question of when the u.s revolution began is answered and that date is probably in reference to the revolutionary wars beginning but then we can see that gptj went ahead and also just made up more q's and a's as they progress they get a little stranger i think and less based in any sort of reality generally the first answer that you get back is indeed a correct answer but then it kind of gets weird so i suspect maggie's diners are in many places and i don't think hawaii has a hockey team either alaska doesn't but i don't think hawaii has a professional one either and utah's state name actually originates from an apache indian word but still the structure of question and answer it continues to be mimicked by the model like it's it's trying to stick with that structure for us so continuing along we can see that the q and a structure happens again when our first answer is correct ish i i hope no one is actually changing their oil every 3000 miles but big oil change would like you to believe that i suppose if you wanted to drain every last drop of oil it would take up about an hour to get every last drop and the final answer about not filling your gas after an oil change is a bit weird i don't think i've ever heard that i don't think that has any basis so that's kind of curious to see i think gptj made that up the concept of the structured prompt can be quite interesting for example we can check out what happens when we mimic a sort of chat log between a human and a bot and in effect we're creating a chat bot from an otherwise just generative language model we're starting off with whatever the human said let's say we leave a space for the bot's response and we're just going to collect up to the new line we get a reply and the bot just says hello back but we can continue this chat log by asking what the bot is up to prompting another response this can go on for as long as we like and the discussion is completely contextualized as the entire previous discussion is passed through the model every single time so the bot should stay relatively on topic we can also remove the next line only bit and let the bot just generate more of the script for us so no longer much of a chatbot but also interestingly the bot decides to indeed close off the chat quite quickly since we did say we were done anyways and taking a shorter section of the script before the goodbyes without any line limit spawns a different behavior translations we can check these and see that some are decent but it's definitely not the best i think we've we've seen before that the further we let gptj walk on its own the sillier it can get and i think this is especially true as soon as the kind of structure goes away or gets changed by the model we can adjust those parameters that we referenced earlier to try to get a handle on this if we want but it is going to be very specific to exactly what you are trying to do we can also be more specific ourselves and we can actually ask the model to go ahead and perform a translation for us and usually in these cases where we try to do directly a translation it does do a much better job here we've done a spanish translation very successfully we can also try something like german and we can see that that indeed also worked again being kind of a next line prediction or translation these tend to work quite well but we can also let gptj do a few translations and just let it kind of pick what does it want to translate to next and again kind of as expected it's randomly choosing its own translations and it does okay but it also gets pretty quickly sidetracked gptj despite getting sidetracked can generate long form responses quite well when the structure doesn't change or isn't suggested that it might change so when you're asking new unique questions or making new unique translations it's very difficult for the model to figure out what should come next in q a for example it could be literally any question and answer for translation it could be any language to translate to if you just want an article this is totally doable and is also similar to the much larger code blocks that we saw earlier with like the neural network in tensorflow chaos i won't hang on this whole thing long enough for you to read it but if you want you can pause the video and give it a read it's fairly well written the arguments aren't really the best that i've ever seen so it's kind of comical and whimsical to argue for ai regulating humans because it can hang out in your closet and give you flowers on your birthday and control your thermostat but regardless of whether or not the argument could sway you it can make the argument in long form text other interesting examples are lists where here for example we list three similar-ish books at least likely to be enjoyed by the same person and we get a list of other similar books though there are some repetitions here it's about in line with what you would expect with this starting book list we can even take this a step further and just start naming some stuff so without even saying you know x list so gpdj understands that these are television shows and it picks a couple more that are similar to the ones that we've already listed not only does gptj understand multiple spoken languages and programming languages it also understands the emoji language this isn't a general intelligence but the model is generally intelligent it knows a lot about a lot of stuff and i encourage you to play with this model if you're able to and think about how you might use this model's abilities in the end it's just a generative model but you've seen that with a tiny bit of logic added it could be a chatbot or a question answerer it can write code for you it can recommend books and movies and it can honestly do probably a ton of stuff that we don't even know about yet because so few people have had the opportunity to play with these gigantic large language models so with that in mind a huge thank you to everyone involved in making gptj a thing i have spent way too much time just simply playing with this model and being absolutely blown away by what it can do it's truly incredible it has been very hard for me to make and close a video like this because i don't think i'll ever feel like i've even remotely done this model justice and showing you all of the things that it can do and still have that video be less than 10 hours so hopefully this was enough to pique your interest and show you some of the many capabilities of gptj if you can't run it locally there is also a web based prompt that you can also check out to just play with gptj obviously you won't be able to directly add logic and stuff like that but you can still tinker in closing i will share a paper someone from the gbtj channel and the la there ai discord shared with me and that's using a 7 billion parameter language model so this one gptj is 6 billion but 7 billion is not too far off and what they've done is taken a large language model and then also trained an image captioning model kept the language model frozen and the inputs are encoded image data from the image captioning model along with text that are structured like they would like and essentially they're mixing few shot natural language with this encoded image data and the results here are essentially few shot learned captioning styles which is crazy i think the fact that something like this is even remotely possible should dispel anyone's thoughts that all these models do is memorize and compress it's no doubt that there's a lot of memorization and compression that's going on but there is something else going on if you'd like to learn more about the goings-ons of neural networks at the most base components there's a book you may have heard of once or twice on my channel called neural networks from scratch you can learn more at nnfs.io that's all for now thanks for watching and until next time you
Info
Channel: sentdex
Views: 128,232
Rating: 4.9732056 out of 5
Keywords: python, programming
Id: _z86t7LerrQ
Channel Id: undefined
Length: 19min 58sec (1198 seconds)
Published: Fri Aug 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.