Natural Language Processing with TensorFlow 2 - Beginner's Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome freako campers to a practical introduction to natural language processing with tensorflow - I am your host dr. Phil Taber in 2012 I got my PhD in experimental condensed matter physics and went to work for Intel Corporation as a back-end dryads process engineer I let there in 2015 to pursue my own interests and have been studying artificial intelligence and deep learning ever since if you're unfamiliar with natural language processing it is the application of deep neural networks to text processing it allows us to do things such as text generation you may have heard the hubbub in recent months over the open AIG pt2 algorithm that allow them to produce fake news it also allows us to do things like sentiment classification as well as something more mathematical which is representing strings of characters words as mathematical constructs that allow us to determine relationships between those words but more on that in the videos it would be most helpful if you have some background in deep learning if you know something about deep neural networks but it's not really required we're gonna walk through everything in the tutorial so you'll be able to go from start to finish without any prior knowledge although of course it would be helpful if you would like to see more deep learning reinforcement learning and natural language processing content check me out here on YouTube ad machine learning with Phil I hope to see you there and I really hope you enjoy the video let's get to it in this tutorial you are gonna learn how to do word embeddings with tensorflow 2.0 if you don't know what that means don't worry I'm gonna explain what it is and why it's important as we go along let's get started before we begin with our imports a couple of housekeeping items first of all I am basically working through the tensorflow tutorial from their website so I'm gonna link that in the description so I'm not claiming this code is my own although I do some cleaning up at the end to kind of make it my own but in general it's not really my code so we start with our imports as usual we need I Oh to handle dumping the word embeddings to a file so that we can visualize later we'll need matplotlib to handle plotting we will need tensorflow as TF and just a word so this is tensorflow 2.10 RSC one release candidate one so this is as far as I'm aware the latest build so tensorflow 2.0 throw some really weird warnings and 2.1 seems to deal with that so I've upgraded so if you're running tensorflow 2.0 and you get funny errors i sorry funny warnings but you still get functional code and learning that is why you want to update to the newest version of tensorflow of course we need chaos to handle pretty much everything we also need the layers for our embedding and dense layers and we're also going to use the tensor flow datasets so I'm not going to have you download your own data set we're going to use the IMDB movie data set for this particular tutorial so of course that is an additional in dependency for this tutorial so now that we've handled our imports let's talk a little bit about what word embeddings are so how can you represent a word for a machine and more importantly instead of a string of characters how can you represent a collection of words a bag of words if you will so you have a number of options one way is to take the entire set of all the words that you have in your say movie reviews you know you just take all the words and find all the unique words and that becomes your dictionary and you can represent that as a one hot encoding so if you have let's say 10,000 words then you would have a vector for each word with 10,000 elements which are predominant zeroes except for the one corresponding to whichever word it is the problem with this encoding is that while it does work it is incredibly inefficient and because it is sparse you know the majority of the data is 0 and the only one important bit and the whole thing so not very efficient and another option is to do integer encoding so you can just rank order the numbers sorry the words you could do it in alphabetical order the order doesn't really matter you can just assign a number to each unique word and then every time that word appears in a review you would have that integer in an array so you end up with a set of variable length of rays where the length of the array corresponds to the number of words in the review and the members of the array correspond to the words that appear within that review now this works this is far more efficient but it's still not quite ideal right so it doesn't tell you anything about the relationships between the words so if you think of the word let's say King it has a number of connotations right a king is a man for one so there's some relationship between a king and a man a king has power right has control over a domain a kingdom so there is also the connotation of owning land and having control over that land King males have a queen so it has some sort of relationship to a queen as well I may have a prince a princess you know all these kinds of different relationships between words that are not incorporated into the an integer encoding of our dictionary the reason is that the integer encoding of our dictionary forms a basis in some higher dimensional space but all those vectors are orthogonal so we take their dot product they are essentially at right angles to each other in a hybrid dimensional space and so their dot product is zero so there's no projection of one vector one word onto another there's no overlap in the meaning between the words at least in this higher dimensional space now word embeddings fix this problem by keeping the integer encoding but then doing a transformation to a totally different space so we introduce a new space of a vector of some arbitrary length it's a hyper parameter of your model much like the number of neurons in a dense layer as a hybrid parameter of your model the length of the embedding layer is a hyper parameter and we'll just say it's eight so the word King then has eight floating-point elements that describe its relationship to all the other vectors in that space and so what that allows you to do is to take dot products between two arbitrary words in your dictionary and you get nonzero components and so that what that means in practical terms is that you get a sort of semantic relationship between words that emerges as a consequence of training your model so the way it works in practice is we're gonna have a whole bunch of reviews from the IMDB data set and they will have some classification as a good or bad review so for instance you know for the Star Wars last Jedi movie I don't think it's in the in there but you know my review would be that it was terrible awful no good totally ruined Luke Luke's character and so you would see and I'm not alone in that so if you did a huge number of reviews for the last Jedi you would see a strong correlation of words such as horrible bad wooden characters MarySue things like that and so the model would then take those words run them through the embedding layer and try to come up with a prediction for whether or not that is a good or bad review and match it up to the training label and then do back propagation to vary those weights in that embedding layer so say eight elements and by training over the data set multiple times you can refine these weights such that you are able to predict whether or not a review is positive or negative about a particular movie but also it shows you the relationship between the words because the model learns the correlations between words within reviews that give it either a positive or negative context so that is word embeddings in a nutshell and we're going to go ahead and get started coding that so the first thing we're gonna have is a an embedding layer and this is just gonna be for illustration purposes and that'll be layer is embedding and let's say there's a thousand and five elements so we'll say result equals embedding layer TF constant one two three so then let's print the result dot numpy okay so let's head to the terminal and execute this and see precisely what we get actually let's do this to print result dot numpy that shape I think that should work let's see what do we get in the terminal and let's head to the terminal now all right let's give it a try okay so what's important here is you see that you get an array of three elements right because we did the TF constant of one two and three and you see we have five elements because we have broken the integers into some components in that five element space okay so and it has shaped three by five which you would expect because you're passing on three elements in each of these three elements these three integers correspond to a word of an embedding layer of five elements okay that's relatively clear let's go back to the code editor and see what else we can build with this okay so let's go ahead and just kind of comment out all this stuff because we don't need it anymore so now let's get to the business of actually loading our data set and doing interesting things with it so we want to use the data set load function so we'll say train data test data and some info t FDS download IMDB reviews if s slash sub words 8 okay and then we will define a split and that as T FDS split train TFTs split dot test and we will have a couple other parameters with info equals true then incorporates information about the about the data set and as supervised equals true so as supervised tells the data set loader that we want to get back information in the form of data and label as a tuple so we have the labels for training of our data so now we're going to need an encoder so we'll say info dot features text dot encoder and so let's just find out what words we have in our dictionary from this we'll say print encoder and sub words first twenty elements save that and head back to the terminal and print it out and see what we can see so let's run that again and you it's hard to see let me move my face over for a moment and you can see that we get a list of words the underscore so the underscore corresponds to a space you get commas periods a underscore and underscore of so you have a whole bunch of words with underscores that indicate that they are spaces okay so this is kind of the makings of a dictionary so let's head back to the code editor and continue building on this so we no longer need that print statement now the next problem we have to deal with is the fact that these reviews are all different lengths right so we don't have an identical length through each other reviews and so when we load up elements into a matrix let's say they're gonna have different lengths and that is kind of problematic so the way we deal with that is by adding padding so we find the length of the longest review and then for every review that is short in that we append a bunch of zeroes to the end in our bag of words so a list of words you know the list of integers we will append a bunch of zeroes at the end so zero isn't a word it doesn't correspond to anything in the words start with one the rank ordinal numbers start with one and so we insert a zero because it doesn't correspond to anything it won't hurt the training of our model so we need something called padded shapes and that has this shape so batch size and an empty list an empty tuple there so now that we have our padded shapes we're ready to go ahead and get our training and test batches so let's do that and since we are good data scientists we want to do a shuffle we're gonna use a batch size of 10 and they padded shapes specified by what we just defined let's clean that up and let's copy because the Train the test batches are pretty much identical except it's test data duffel and it's the same size so we don't have to do any changes there scroll down so you can see okay so that gives us our data so what we need next after the data is an actual model so let's go ahead and define a model so in as is typical for Karis it is a sequential model and that takes a list of layers so the first layer is an embedding layer and that takes encoder vocab size now this is you know given to us up here by the encoder object that's given by the information from our data set and we have some vocabulary size so there's 10,000 words vocabulary size is vocab size it's just the size of our dictionary and we want to define an embedding DIMM so that's the number of dimensions for our embedding layer so we'll call it something like 16 to start so let's add another layer global gobble global average pulling 1d and then we'll need a finally a dense layer 1 output activation equals sigmoid so it this seems mysterious what this is is the probability that a mapping of sorry this layer is the probability that the review is positive so it's a sigmoid go ahead and get rid of that and now we want to compile our model with the Adam optimizer a binary cross entropy loss with accuracy metrics not meatrix metrics equals accuracy ok that's our model and that is all we need for that so now we are ready to think about training it so let's go ahead and do that next so what we want to do is train and dump the history of our training and an object called that we're going to call history model that fit we're gonna pass train batches 10 epochs and we're gonna need validation data and that'll be test batches and we use something like 20 of validation steps ok so let's scroll down a little bit so you can see it first of all and then we're gonna think about once it's done let's go ahead and plot it so let's may as well do that now so let's handle that so we want to convert our history to a dictionary and that's history dot history and we want to get the accuracy by taking the accuracy key and we want the validation accuracy using correct syntax of course Val accuracy for validation accuracy and the number of epochs is just range one to length of accuracy plus one so then we want to do a plot big size nice and large 12 by 9 we want to plot the epochs versus the accuracy b0 label equals training accuracy we want to plot the validation accuracy using just a blue line not blue O's or dots blue dot sorry and label equals validation accuracy plot dot X label each box plot dot y label accuracy and let's go ahead and add a title while we're at it creating a validation accuracy roll down a little bit we will include a legend having extraordinarily difficult time typing tonight location equals lower right and a why limit of a 0.5 and one that should be a tuple excuse me and plot dot show alright so let's go ahead to the terminal and run this and see what the plot looks like and we are back let me move my ugly mug over so we can see a little bit more and let us run the software and see what we get ok so a started training and it takes around 10 to 11 seconds per epoch so it's gonna sit here and twiddle my thumbs for a minute and fast forward the video while we wait so of course once it finished running I realize I have a typo and that is typical so in line 46 it is P it is I spelled out plot instead of P LT but that's alright let's take a look at the data we get in the terminal anyway so you can see that the validation accuracy it's around 90 2.5% pretty good and the training accuracies around 93 point a 2 so a little bit of overtraining and I have run this a bunch of times and you tend to get a little bit more we're training I'm kind of surprised at this final now that I'm running over YouTube it is actually a little bit less overtraining but either way you know it there are some evidence of overtraining but a 90% accuracy for such a simple model isn't entirely hateful so I'm gonna go ahead and head back and correct that typo and then run it again and then show you the plot so it is here in line 46 right there and just make sure that nothing else looks wonky and I believe it is all good they're looking at my cheat sheet everything looks fine ok let's go back to the terminal and try it again all right once more all right so it has finished and you can see that this time the validation accuracy was around eighty nine point five percent whereas the training accuracy was ninety three point eight five so it is a little bit over trainee in this particular run and they're a significant run to run variation as you might expect so let's take a look at the plot all right so I've stuck my ugly mug right here in the middle so you can see that the training accuracy goes up over time as we would expect and the validation accuracy generally does that but kind of tops out about halfway through the number of epochs so this is clearly working and this is actually pretty cool with such a simple model we can get some decent review or sentiment as it were classification but we can do one more neat thing and that is to actually visualize the relationships between the words that are embedding learns so let's head back to the code editor and then let's write some code to tackle that task okay so before we do that you know I want to clean up the code first let's go ahead and do that so I will leave in all that commented stuff but let's define a few functions we'll need a function to get our data we'll need a function to get our model and we'll need a function to plot data and we'll need a function to get our embeddings will say retrieve and beddings and i'll fill in the parameters for those as we go along so let's take this stuff from our get our data cut that paste it and of course use proper indentation because python is a little bit particular about that okay make sure everything lines up nicely and then of course we have to return the stuff that we are interested in so we want to return train data test data and in fact that's not actually what's going to do I take it back let's come down here and we want our sorry we don't actually want to return our data bill and turn our batches so return train batches test batches and we'll also need our encoder for the visualizing the relationships relationships between words so let's return that now okay now let's handle the function for the get model next so let's come down here and grab this actually let's yeah I grab all of it and come here and do that and let's make embedding dim a parameter of our model and you notice in our model we need the encoder so we also have to pass in the encoder as well as an embedding dim and then at the bottom of the function we want to return that model pretty straightforward so then let's handle the plot data next so we have all of this grab that and indent here so we're going to need a history and that looks like all we need because we define epochs accuracy and validation accuracy okay so it looks like all we need in the plot data function so then we have to write our retrieve embeddings function but first let's handle all the other stuff we'll say train batches test batches and encoder equals get data in fact let's rename that to get batch data to be more specific this is kind of being pedantic but you always want to be as per as descriptive as possible with your naming conventions so that way people can read the code and know precisely what it does without having to you know make any guesses so if I just say get data it isn't necessarily clear that I'm getting batches out of that data you know I could just be getting single instances it could return a generator it is a little bit ambiguous so changing the function name to get batch data is the appropriate thing to do so then let's say model equals get model and we pass it the encoder and then the history will work as intended and then we call our function to plop history and I should work as intended as well and now we are ready to tackle the retrieve embeddings function so that is relatively straightforward so what we want to do is we want to pass in the model and the encoder and we don't want to pass what we want to do is we want to the purpose of this function is to take our encoder and dump it to a TSV file that we can load into a visualizer in the browser to visualize the principal component analysis of our word and codings so we need files to write to and we need to enumerate over the sub words in our encoder and write the metadata as well as the vectors for our encoding so out vectors IO dot open Beks TSV and in right mode and a coating of utf-8 we need pout metadata and that's similar meta TSV right encoding equals UT have - 8 very similar now we need to iterate over our encoder or sub words and get the vectors on event to dump to our vector file as well as the metadata weights sub num +1 and so we have the +1 here because remember that we start from 1 because 0 is for our padding right 0 doesn't correspond to a word so the words start from 1 can go on so we want to write the word plus a newline and for the vectors that's right a tab delimited string X in vector and plus a newline character at the end and then we want to close our files ok so then we just scroll down and call our function retrieve embeddings model and encoder okay so assuming I haven't made any typos this should actually work so I'm gonna go ahead and head back to the terminal and try it again all right moment of truth so it is training so I didn't make any mistakes up until that point 1 second we'll see if it actually makes it through the plot oh but really quick so if you run this with tensorflow - let me move my face out of the way if you run this with tensorflow - you will get this out of range end of sequence here and if you do well if you do a google search for that you will see a thread about it in the github and basically someone says that it is fixed in 2.10 Darcy won the version of tensorflow which I am running however I still get the warning on the first run in version 2.00 I get the warning on every epoch so it kind of clutters up the terminal output but it still runs nonetheless and gets comprable accuracy so it doesn't seem to affect the model performance but it you know makes for an ugly YouTube video and gives me uneasy feelings so I went ahead and updated to the latest release candidate to 1.0 and you can see that it works relatively well so one second and we'll see the plot again and of course I made a mistake again its plot history not its plot data and I'll plot history let's fix that all right plot let's change this to plot history because that is more precise and we will try it again let's do it all right so it has finished and you can see that the story is much the same a little bit of what we're training on the training data let's take a look at the plot and the plot is totally consistent with what we got the last time you know an increasing training accuracy and a leveling off a validation accuracy so let's go ahead and check out how these word embeddings look in the browser but first of course I made a mistake so weights are not defined and that is because I didn't define them so let's go back to the code editor and do that all right so what we want to do is this weights equal model dot layers sub 0 dot get weights so this will give us the actual weights from our model which is the the zero player is the embedding layer and we want to get the weights and the zeroth element of that so I'm gonna go ahead and head back to the terminal and I'm gonna actually get rid of the plot here because we know that works and I'm sick of seeing it so we will just do the model fitting and retrieve the embedding so let's do that now so one of the downsides of doing code live is I make all kinds of silly mistakes while talking and typing but that's life see in a minute all right so that finished running let's head to the browser and take a look at what it looks like okay so can i zoom in I can a little bit so let's take a look at this so to get this you go to load over here on the Left sign you can't really see my cursor but you go to load on the left side load your vector and metadata files and then you want to click on this three labels mode here and let's take a look at this so you see right here on the left side a next seated and ottoman so these would make sense to be you know pretty close together because they you know kind of would you would expect those three words to be together right annex and ceeded if you annex something someone else has to cede it it makes sense let's kind of move around a little bit see what else we can find okay so this looks like a good one we see waterways navigable human rainfall petroleum earthquake so you can see there are some pretty good relationships here between the words that all makes sense if you scroll over here what's interesting as you see Estonia herzog Ivania Slovakia sorry for mispronouncing that Cyprus you see a bunch of country names so it seems to learn the names it seems to learn that there are relationships between different geographic regions in this case countries there we see seeded and annexed on Ottoman again and you can even see Concord in here next to annexed and seeded deposed ark Bishop Bishop assassinated oh he can't see that my face there just moved me over so now you can see surrendered conquered Spain right Spain was conquered for a time by the Moors Archbishop deposed surrendered assassinated invaded you can see all kinds of cool stuff here so this is what it looks like I've seen other words like beautiful wonderful together other stuff so if you play around this you'll see all sorts of interesting relationships between words and this is just the visual representation of what the word embeddings look like in a reduced dimensional representation of its higher dimensional space so I hope that has been helpful I thought this was a really cool project just a few dozen lines of code and you get to something that is actually a really neat kind of a neat result where you have a higher dimensional space that gives you mathematic relationships between words and it does a pretty good job of learning the relationships between those words and now what's interesting is I wonder how well this can be generalized to other stuff so if we feed it you know say twitter Twitter tweets could we get the sentiment out of that I'm not entirely sure that's something we would have to play around with it seems like you would be able to so long as there a significant overlap in the dictionaries between the words that we have for the IMDB reviews and the dictionary words from the Twitter feeds that we scrape but that would be an interesting application of this to kind of find toxic Twitter comments and the like but I hope this was helpful just a reminder my new course is on sale for $9.99 for the next 5 days there'll be one more sale last several days of the year but there will be a gap of several days in between this channel totally supported by and revenue as well as my core sales so if you want to support the cause go ahead and click the link in the pinned comment slash description and if not hey go ahead and share this because that is totally free and I like that just as well leave a comment down below hit the subscribe button if you haven't already hit the bell icon to get notified when I release new content and I will see you in the next video in this tutorial you are gonna learn how to do it send them a classification with tensorflow 2.0 let's get started before we begin a couple of notes first of all it would be very helpful if you have already seen my previous video on doing word embeddings in tensorflow 2.0 because we're gonna be borrowing heavily from the concepts I presented in that video if not it's not a huge deal I'll show you everything we need to do as we go along it's just it'll make more sense with that sort of background second point is that I am working through the official tensor flow tutorials this isn't my code I did have to fix a couple of bugs in the code so I guess that makes it mine to some extent but nonetheless I did not write this so I'm just presenting it for your consumption in video format all that said let's go ahead and get to coding our sentiment analysis software so as usual we begin with our imports we will need the tensor flow datasets to handle the data from the IMDB library of course you need tensor flow to handle tensor flow type operations so the first thing we want to do is to load our data set and get our training and testing data from that as well as our encoder which I explained in the previous video so let's start there data set and info is load of the IMDB views help if I spelled it correctly sub words 8k now just a word these are the reviews a bunch of reviews from the IMDB data set so you have a review with an Associated classification of either positive or negative with info he goes true as supervised equal to true tab that over next we will need our training and testing data sets set equals a dataset sub train and data set sub tests and finally we need part encoder dot encoder good grief I can type cannot type tonight at all so if you don't know what an encoder is the basic idea is that it is a sort of reduced dimensional representation of a set of words so you take a word and it associates that with an N dimensional vector that has components that will be non perpendicular to other words in your dictionary so what that means is that you can express words in terms of each other whereas if you set each word in your dictionary to be a basis vector they are orthogonal and so there's no relationship between something like king and queen for instance whereas with the auto encoder representation whereas with the sorry the weird embedding representation it is the it has a nonzero component of one vector along another so you have some relationship between words that allows you to parse meaning of your string of text and I give a better explanation in my previous video so check that out for your own education so we're going to need a couple of global variables and buffer size 10,000 a batch size for training and some padded shapes and this is for padding so when you have a string of words the string of words could be different lengths so you have to pad to the length of the longest review basically and that is batch size by empty so the next thing we'll need is our actual data set we're going to shuffle it because we are good data scientists and we're gonna want to get a padded batch from that in the shape defined with the variable above and the test data set is very similar good grief so Ivan I'm using them for my new text editor part of my new year's resolution and let's yank that and it is a little bit tricky if you've never used it before I'm still getting used to it there we go as you can see then we have to go back into insert mode test data set test data set dot padded bench and padded shapes all right that is good next thing we need is our models so the model is going to be a sequential Karras model with a bi-directional layer as well as a couple of dense layers we use a binary cross interview loss with an atom optimizer learning rate of 10 by 1 by 10 to the minus 4 and then we will say TF Karis layers and bedding and kotor vocab the size 64 TF care us players bidirectional TF Karis layers LS TM 64 to parentheses dents and that is 64 with a Lu activation if I could ever learn to type properly that would be very helpful another Bend Slayer with an output and this output is going to get a sigmoid activation and what this represents is the probability of the review being either positive or negative so the final output of the model is going to be a floating-point number between 0 and 1 and it will be the probability of it being a positive review and we're gonna pass in a couple of dummy reviews just some kind of softball kind of stuff to see how well it does but before that we have to compile our model and with a binary cross-entropy loss optimizer equals TF Kerris optimizers and I'm learning rated 1 by 10 to the minus 4 and we want metrics accuracy and then we want the history which is just the model fit and this is really for plotting purposes but I'm not gonna do any plotting you get the idea that the you know the accuracy goes up over the time and and the loss goes down over time so no real need to plot that train data set we're just gonna do three epochs you can do more but for the purposes of the video I'm just going to do three I just do five because I'll do five for the next model we're gonna do validation data equals test data set and validation steps 30 so next we need to consider a couple of functions so one of them is to pad the the vectors that we pass in to whatever size and the second is to actually generate a prediction so let's define this functions and just to be clear this is for the sample text we're going to pass him because remember the reviews all are all of varying lengths and so we have to for purposes of the I guess you could say continuity of inputs to your model and not a really technical phrase but so that way you pass in the same length of vector to you know your model for the training we have to deal with a problem of the same problem with the sample text so we're gonna pass in because we don't have an automated tensorflow function to handle it for us and we're gonna Pat it with zeros because those don't have any meaning in our dictionary and we want to return the vector after extending it so if you're not familiar with this idiom in Python you can multiply a quantity like say a string by a number to basically multiply that string so if you had the letter A multiplied by 10 it would give you 10 s and you can do that with you know list elements as well pretty cool stuff neat little feature of Python little known I think but that's what we're doing here so we're gonna going to pad the zeros to the size of whatever whatever size we want - whatever the length of our vector is and extend that vector with those zeros next we need a sample predict function and the reason we can't just do model dot predict is because we had the the issue of dealing with the padding text equals encode or got encode and remember the encoder is what goes from the string representation to the higher dimensional representation that allows you to make correlations between words so if you want to Pat it then pad to size encoded sample bread text 64 that's our batch size or our max length sorry and then encoded sample bread text was TF cast 32 and predictions model that predict you have that expand dimensions encoded sample bread text 0 batch dimension return the predictions all right so now we have a model that we have trained once you run the code of course now let's come up with a couple of dummy simple very basic reviews to see how it scores them so we'll say sample text equals this movie was awesome acting was incredible highly recommend then we're gonna spell sample text correctly of course and then we're going to come up with our predictions ego sample predict sample text pad equals true and we're gonna multiply that by 100 so we get it as a percentage and can i I can't quite scroll down that is a feature not a bug I am sure you can write in whatever positive review you want so then we'll say print probability this is a positive review predictions and I haven't done this before so when I coded this up the first time I have it executing twice once with pat equals false once with pat equals true to see the Delta in the predictions and surprise surprise is more accurate when you give it a padded reviews but in this case I'm gonna change it up on the fly and do a different set of sample text and give it a negative review and see how it does this movie was soso I don't know what this is gonna do that's kind of a you know vernacular I don't that is in the database so we'll see the acting was mediocre kind of recommend and predictions sample predict sample and text Pat equals true times 100 and we can yank the line and paste it alright okay so we're gonna go ahead and save this and go back to the terminal and execute it and see how it does and then we're gonna come back and write a slightly more complicated model to see how well that does to see if you know adding complexity to the model improves the accuracy of our predictions so let us write quit and if you've never used vim you have to press : WQ sorry when you're not in insert mode right quit to get help and then we're gonna go to the terminal and see how well it does alright so here we are in the terminal let's give it a shot and see how many typos I made ooh interesting so it says check that the data set name is spelled correctly that probably means I misspelled the name of the data set all right let me scroll up a little bit it's IMDB reviews okay I am alright their dataset yeah you can't yeah right there okay so I misspelled the name of the dataset not a problem then TF sentiment let us go up to here I am DB right quit and give it another shot I misspelled dents okay can you see that no not quite it says here let me move myself over has no attribute dents so let's fix that that's in line 24 124 insert an S quit and try again there now it is training for five eat box I am gonna let this ride and show you the results when it is done really quick you can see that it gives this funny error let me go ahead and move my face out of the way now this I keep seeing in the tensorflow two steps so as far as I can tell this is related to the version of tensor flow this isn't something I'm doing or you're doing there is an open issue on github and previously it would run that error every time I trained with every epoch however after updating I do I think tensorflow 2.1 it only does after the first one so I guess you gain a little bit there but it is definitely but it's definitely an issue with tensor flows so I'm not too worried about that so let's go ahead on this train alright so it has finished running and I have teleported to the top right so you can see the accuracy and you can see accuracy starts out low and up around 93.9% not too shabby for just five epochs on a very simple model like why is the lost arts relatively high and goes relatively low what's most interesting is that we do get a seventy nine point eight percent probability that our first review was positive which it is so an 80% probability of it being correct is pretty good and then an only 41 point 9 3 percent probability the second being positive now this was a bit of a lukewarm review I said it was so so so a 40% probability of it being positive is pretty reasonable in my estimation so now let's see if we can make a more complex model and get better results so let's go back to the code and type that up so here we are let's scroll down and say let's make our new model so model you have to make sure you're an insert mode of course model equals TF Harris sequential TF Kerris layers of course you need an embedding layer to start encoder vocab the size 64 let's move my mug like so and at our next layer which is Karis layers bi-directional LST m64 returned true and I am way too far over 88 that is still well we're just gonna have to live with it it's just gonna be bad code not up to the Pepe standards but whatever Sumi bi-directional ell stm32 caris layers dot dense and 64 with a value activation and to prevent overfitting we are going to add in a little bit of drop out just 0.5 so 50% and add our final classification layer with a sigmoid activation model do I have I'm a double check here looks like I forgot a friend to see there we go good grief delete that line and make our new model a lot compile loss equals binary cross entropy optimizer equals atom same learning rate we don't want to change too many things at once that wouldn't be scientific accuracy mystery equals model that fit train data set data sets not cert epochs equal 5 validation data sets equals test data set 30 validation steps and we're just going to scroll up here and copy whoop copy all of this visual yank and come down and paste all right so aha what's so I'm detecting a problem here so I need to modify my sample predict a problem my sample predict so let's go ahead and pass in a model call it model underscore just to be safe because I'm declaring one model and then another want to make sure these scoping issues are not going to bite me in the rear end I need model equals model and let's do likewise here well if you goes model and we'll come up here and modify it here as well just to be pedantic and I'm very tired so this is probably unnecessary but we want to make sure we aren't getting any funny scoping issues so that the model is doing precisely what we would expect so let's go ahead and write quit and try running it oh actually I take it back I want to go ahead and get rid of the fitting for this because we've already run it now we can leave it actually you know what now that I'm thinking about it let's just do this and then we will comment this out all right and then we don't even need the the model equals model there but I'm gonna leave it all right let's try it again let's see what we get so remember we had a eighty percent and forty one percent or forty two percent probability of it being positive so let's see what we get with the new model validation and data set so I must have missed typed something so let's take a look here right there and because it is validation data not validation data set all right try it again all right it is training I will let this run and show you the results when it finishes so of course after running it I realize I made a mistake in the in the Declaration of the sample predict function typical typical unexpected keyword argument so let's come here and you know let's just get rid of it oh cuz it's model underscore yeah let's get rid of it because we no longer need it and get rid of this typical typical all right this is one of the situations in which a jupiter notebook would be helpful but whatever I will stick to them and the terminal and PI files because I'm hold alright let's try this again and I'll just go ahead and edit all this out and we will meet up when it finishes I've done it again it's not my day folks not my day and let us find that there delete once again all right so I finally fixed all the errors it is done training and we have our results so probability this is a positive review 86% a pretty good improvement over 80% what's even better is that the probability of the second review which was lukewarm so so being positive has fallen from 41 or 42 percent down to 20 22 % almost cut in half so pretty good improvement with a they you know somewhat more complicated model and at the expense of slightly longer training so you know 87 seconds as opposed to 47 seconds so I know some ties 6 minutes as opposed to three not too bad so anyway so what we've done here is loaded a series of IMDB reviews used it to train a model to do sentiment prediction by looking at correlations between the words and the labels for either positive or negative sentiment and then asking the model to predict what the sentiment of a obviously positive and somewhat lukewarm review was and we get pretty good results in a very short amount of time that is the power of tensorflow 2.0 so I thank you for watching any questions comments leave them down below I try to answer all of them less so now that I have more subscribers more views it gets a little bit more overwhelming but I will do my best speaking of which hit the subscribe button hit the notification bell because I know only 14% of you are getting my notifications and look forward to seeing you in the next video where he sees your head my lovely we sleep her with my hate or for me think that we give his cruel he cries said your honor's ear I shall grow moss no I haven't just had a stroke don't call 9-1-1 I've just written a basic artificial intelligence to generate Shakespearean text now we get to finally address the question which is better writing Shakespearean sonnets a billion monkey hours or a poorly trained AI let's get started alright first before we begin with our imports a couple of administrative notes the first of which is that this is an official tensorflow tutorial I have not written this code myself and in fact it is quite well written as it is the first tutorial I haven't had to make any Corrections or adjustments to so I will leave a link in the description for those that want to go into this in more detail on their own time so feel free to check that out when you have a moment available let's get started with our imports the first thing one import is os that will handle some operation os-level type stuff we want tensorflow as TF of course I only want numpy as NP now notably we are not importing the tensorflow data set imports because this is not using an official tensorflow data set rather it is using the data due to I believe Andrey Carpathia gets a credit for this but it is basically a text representation of a Shakespearean sonnet which one I don't know it doesn't state in the tutorial and I am NOT well-read enough to be able to identify it based on the first several characters I suppose if I printed out another to the terminal I couldn't figure it out based on who's in it but I don't know and it's not really all that important but what is important is that we have to download it using the built-in tensorflow chaos utils and of course they have their own function to get a file and it's just a simple text file called Shakespeare text and that lives at HTTP storage google api's comm Shakespeare dot txt okay and so let's get an idea for what we're working with here so let's open it up in read binary mode with an encoding of utf-8 and let's go ahead and print out the length of the text so we'll say length of the text blank characters dot format when text and let's go ahead and print the first 250 characters to get an idea of what we are working with all right let's head to the terminal and test this out say python TF text Gen dot PI object has no attribute decode so I have messed something up most likely a parenthesis somewhere text equals open path to file dot read that's right I forgot the read method insert read dot decode there we go let's try that perfect so now we see that we do indeed have some text and it has 1 million 115,000 394 characters so a fairly lengthy work you know several hundred thousand words at least and you see begins with first is it first citizen this is important because we're gonna refer back to this text a few different times in the tutorial so just keep in mind that the first word is first very very simple and hey if you know what play or sonnet this is leave a comment down below because you're you know more well cultured more well-read than I am I would be interested to know but let's proceed to the tutorial its head back to our file and the first thing we want to do is comment these out because we don't want to print that to the terminal every single time we run the code but the first thing we have the handle is vectorizing our text now if you haven't seen my other two tutorials on natural language processing and tensorflow you we have to go from a text-based representation to an integer representation or in some cases yet totally energy representation not floating-point in order to pass this data into the deep neural network so let's go ahead and start with that so we say our vocabulary is going to be sorted a set of the text so we're just going to sort it and make a set of unique stuff so we'll say print or unique words rather blank unique characters format blend of vocab so we now important thing to keep in mind is that we are starting with merely characters we are not starting with any conception of a word so the model is gonna go from knowing nothing about language at all to understanding the concept of words as well as line breaks and a little bit about grammar you kind of saw from the introduction that it's not so great probably better than the monkeys typing away but it is you know starting from cleat scratch into something that kind of approximates language processing so we have sorted our vocabulary now we have to go from the character space to the integer representation so we'll say care to IDX where care is just you know character that's going to be dictionary of unique characters and their integer ID X their integer encoding for ID X unique in enumerate vocab a closing bracket and we need the ID X to care which is the inverse operation numpy array of vocab then we have something called text as int and that's an umpire array of a list comprehension of caret to IDX of care for care in text so we're just going to take all the characters in the text look up their ID X representation and stick it into a vector numpy array in this case so now let's go ahead and print this stuff out to see what we're dealing with to see what our vocabulary looks like and we'll make something pretty looking we'll say for care blank and zip care to IDX range 20 we're only going to look at the first 20 elements we don't need to print out the whole dictionary print for s : 3d got format representation of the character care to IDX care and then at the end we'll print a new line I shouldn't do this too so we'll say print blank characters Matt - int you know how many characters they'll be mapped to int the format representation of text just the first 13 text has int 13 to add that over and write this and run it as you unexpected end of file while parsing okay so what that means is I have forgotten a parentheses which is here perfect now we can write quit now let's give it a shot okay so you can see we have 65 unique characters so we have a dictionary of 65 characters and newline maps to 0 space maps to 1 so basically it's the sort has placed all of the characters the non non alphanumeric characters at the beginning and we even have some numbers in there curiously the number 3 maps to 9 but whatever and then you see we have the capital letters and the lowercase letters will follow later and so our first sentence is first citizen first 13 characters rather and that map's to this following a vector here so we have gone from this string to this and vector representation so that is all well and good but that is just the first step in the process so the next step is handling what we call the prediction problem so the real goal here is to feed the models some string of text and then it outputs the most likely characters it thinks will follow based on what it reads in the Shakespearean work and so we want to chunk up our data into sequences of length 100 and then go ahead and use that to create a data set and then from there we can create batches of data in other words chunks of sentences or chunks of whatever sequence like the characters we want let's go ahead and go back to our event editor and start there and so the first thing is we want to go ahead and comment all this out because we don't want to print everything every single time and then handle the problem of the sequence length so we'll say sequence lines equals 100 characters something manageable you want some into smalls them into large so number of examples per epic PETA's line of text / sequence a length plus 1 where does a plus 1 come from it comes from the fact that we're going to be feeding it a character and trying to predict the rest of the characters in the sequence so you have the plus 1 there next we have a care data set TF data data set of course it's tensorflow it has to deal with its own data sets it doesn't handle text files too well so we're going to go data set from tensor slices text as int let's go ahead and print out what we have here for I in care data set dot take the first five elements and so this is just a sequence of individual characters so we should get the first five characters out print hi DX to care hi numpy let's go ahead and go to the terminal and run this right quit and run it once more and you see we get the word first as one would expect that is the we scroll up that is the first five characters first and then citizen okay so that seems to work so now let's handle batching the data so let's go back to our vim editor get rid of this print statement by inserting a couple of comments and worry about dealing with a batch so sequence says equals care and dataset top badge steepens length plus one drop remainder equals true so we just get rid of the characters at the end for item can I scroll down at all does it let me do that no it does not one of the downsides of vim as an editor so for item in sequence is take five the first five sequences of 100 characters print representation blank join idx to care item number I and a whole bunch of parentheses and let's go ahead and go back to the terminal and see how this runs so let's run it and you see I really should put a new line in there at the beginning we can see for a citizen before we proceed any further here me speak blah blah blah so we get a bunch of character sequences including the newline characters so that is pretty helpful so one thing to note is that these new lines are what give the the deep neural network a sense of where line breaks occur so it knows that after some sequence of characters they should expect the line break because that formulates you know the kind of metered speaking that you find in Shakespeare so that's well and good let's go ahead and handle the next problem of splitting our data into chunks of target and input text remember we have to start with one character and predict the next set of characters so let's handle that but of course to begin we want to comment that out and in fact we do we need this now let's leave it in there it's not going to hurt anything so we'll say we're gonna define a function called split input target and that takes a chunk of data as input and it says input text he goes chunk everything up to minus one target text egos chunk one onward return input text target text so we're going to get an input sequence as well as a target so we want to double set how we want to double check this by saying data set equal sequences dot map on a map dysfunction on to our sequences split input target to add in a new line for clarity and say you know let's do this there we go so we'll say we're gonna print the first examples of the input and target values say for input example target example in dataset take just the first thing print input data representation blank join idx to care inputs example dot numpy whole bunch of parentheses print target data representation blank join idx to care target example numpy alright let's head to the terminal and try this okay so you see our input data is this for a citizen before we proceed any further and it ends with you and then the target data is earth citizen so given this input what is the target so we have basically shifted the data one character to the right for our target with respect to our input and that's a task given one character predict the next likely sequence of characters so to make that more clear let's go ahead and kind of step through that one character at a time so let's come down here and of course the first thing we want to do is get rid of these print statements and then say for I input ID X target ID X in the numerator input example first five target example I forgot a zip statement 5 how many that's the enumerate that gets a colon I forgot my zip here in a numerate zip I had an extra parentheses and then we want to add a print statement I will say print step four d dot format I print blank input some string format input ID X representation of ID x to care and put a DX prynt expected output yes dot dot format target PI DX comma representation I'd X to care target ID X alright now let's head to the terminal and run this and we should get something that makes perfect sense the input example is not defined okay so input example oh of course I got rid of this alright alright so here you can see the output so step 0 the input is an integer 18 that maps to the character F and the expected output is I so it knows that it should expect the next character which is the next character in this evening I'll keep in mind this isn't trained with an RNN yet this is just stepping through the data to kind of show you that given one character what should it expect next so that's all well and good next thing we have to handle is creating training batches and then training our model building and training the model so let's head back to the text editor and handle that so let's go ahead and comment all this out and handle the conception of a batch so we'll say let's handle the batch size next we'll say badge size equals 64 and buffer size just how many characters you unload 10,000 dataset ego state asset shuffle buffer size that batch batch size drop remainder heels true then we want to say vocab the size you goes learn a vocab we're gonna start building our model next so embedding dimension 256 Arne in units 1024 so we will use a function to go ahead and build our model we'll say def build model vocab the size embedding dim RNN units batch size model TF Kerris sequential TF Kerris layers an embedding layer of course we have to go with an embedding layer at the beginning because if you recall from the first video we had to go from this integer representation to a reduce dimensional representation a word embedding that allows the model to find relationships between words because this integer basis all of these vectors are orthogonal to one another there's no overlap of characters however in the word embedding the higher dimensional space or reduced dimensional space allows you to have some overlap of relationship between characters so those vectors are not orthogonal they are to some extent collinear so just a bit of math speak for you but that is what is going on there vocab size embedding dim the batch you can put shape equals batch size by none so it can take something arbitrary and and that next layer T of Karass layers a gated recurrent unit a type of R and n that is R and n units return sequences equals true stateful equals true recurrent initializer initializer I think I spelled that right glow rot uniform is that right yep okay so now we have another layer let's tab that over say T of Karis out layers dense and a lot with something of a vocab sighs so now let's and that and return our model so now that we have a model the next thing we want to do is build and compile that model so we'll save model those build model vocab size equals one out vocab you know this is one I guess one kind of thing I don't like about the tutorial embedding din there's a little bit of that little bit right there but whatever betting dim and we need aren't in units equals RNN units badge size equals batch size so that will make our model and let's go ahead and see what type of predictions that model outputs without training so we'll say for input example batch target example batch and data set take one I'll keep in mind this is going to be quite rough because there is no you know there's no training yet so it's gonna be garbage but let's just see what we get so they say example batch predictions he does model input example batch print example let's print the shape example batch shape and I should be batch size sequence length and vocab sighs and you know what while we're at it let's just print out a model summary so you can see what's going on and see what is what so let's head to the terminal try this again see how many typos I made batch inputs shape that's probably a batch input shape on line 77 right here batch size batch size what have I done something stupid and no doubt oh it's probably here batch inputs shape there we go try it again okay so you can see that it it has outputs of something batch size by hundred characters by boat capsized makes sense here is the model four million or so parameters they're all trainable and you can see that the majority of those are in the gated recurrent unit so let's go back to the text editor and start thinking about it's not thinking about training the model so we come here let's go ahead and get rid of this print statement but you don't need it we can get rid of the model summary as well and think about training our model the first thing we need to train the model is a loss of function so we'll pass in labels and log its and return TF care s losses sparse categorical cross-entropy labels log it's from log it's equals true and then since we are good python programmers we will format this a little bit better like that and we can go ahead and start training our model so we will say so we will say a model compile and optimizer equals Adam loss equals loss and say checkpoint directory equals dot slash training checkpoints Check Point prefix use LS path join checkpoint der check yeah checkpoint underscore epoch so epoch is a variable it's gonna get passed in by tensorflow or care us in this case and so it'll know whatever you know epoch Ron will save a checkpoint with that name checkpoint call back you have to define call backs TF Karis called backstop model checkpoint file path Eagles checkpoint prefix save weights only equals true and so we'll train for in this case I don't know something like for reference I trained it for a hundred eight box to generate the text you saw at the beginning of the tutorial but it doesn't really matter all that much so we'll say 25 epochs because it's not the most sophisticated model in the world so we'll say history equals model dot fit data set epoch sequels epochs callbacks equals checkpoint call back alright let's head to the terminal and run this it says expected string bytes not a tuple oh okay so as paths join that I've probably made some kind of silly mistake it says checkpoint Durr that is a string checkpoint underscore epoch is fine that's interesting now what was that error that is on line 91 oh I understand so I have a comma there at the end so it's an implied tuple okay let's try this again scratching my head trying to figure that one out all right so now it is training so I'm gonna go ahead and let this run and I will be back when it is finished okay so it has finished training and you can see that the loss went down by a factor of you know three or four about three or so from two point seven all the way down to 0.77 so it did pretty well in terms of training now this is 25 epochs we don't have to rerun the training because we did the model checkpointing so the next and final order of business is to write the function to generate the predictive text you know the output of the model so that way we can kind of get some sort of idea of what sort of Shakespearean prose this artificial intelligence can generate let's go ahead and head to our file so the first thing we have to think about is how are we going to handle loading our model and that will require that we don't do the build model up here so we can just get rid of that and we certainly don't want to compile or train the model again we want to load it from a check point so what we'll do is say model goes build model oh cab sighs embedding dim RNN units and batch size equals what batch size equals one that's right because when we pass in a set of input text we don't want to get out you know a huge batch of output text we just want a single sequence of output text let me see a model dot load weights T of that train latest checkpoint checkpoint der so this will scan the directory and get our load as checkpoint latest checkpoint and then we want to build the model by saying T of tensor shape one by none so back size of one and an arbitrary length of characters so then we'll say model dot summary and we can scroll down a little bit for readability so that'll print out the new model to the terminal so the next thing we have to handle is the prediction of the prediction problem of generating text so let's say define generate text model and start string so we need to pass in the model we want to use to generate the text as well as a starting string a prompt for the AI if you will I'm generate equals 1000 that's the number of characters we want to generate input eval equals care to IDX s for s and start string we have to go to the character representation of sorry the integer representation of our characters and then we have to expand that along the batch dimension we need an empty list to keep track of our generated text and a temperature so the temperature kind of handles theme so the temperature kind of handles the surprising factor of the text so it'll take the text and scale up by some number in this case a temperature one means just whatever they model output so a smaller number means more more reasonable more predictable text and larger number gives you some kind of crazy wacky type of stuff so let us reset states on our model and say where I scroll down I in range Nam generates predictions equals model inputs eval predictions equals T up squeeze along the batch dimension 0 predictions equals predictions divided by temperature and predicted ID which is the the prediction of the ID of the word returned by the model TF random categorical predictions nom samples equals 1 minus 1 0 dot number pi then we say input eval equals TF that expand dims predicted PI D 0 text generated dot append ID x2 care predicted ID so if you're not familiar with this the random category was a probability distribution when you have a set of discrete categories and it will predict them I forgot a 1 here that will break so it will it will generate predictions according to the distribution defined by this variable predictions so then we want to return start string and that may be familiar to you if you watch some of my other reinforcement learning tutorials the actor critic methods in particular use the categorical distribution plus and B string that join text generated so then you want to say print generate text model start string equals Romeo : give it a space as well all right now moment of truth let's see how well our model does write that go to the terminal and try it again so you see it loads the model pretty well and we have our text that is quite quick so King Richard the third says I will practice on his son you are be heads for me you Henry Brutus replies and welcome general and music the while Tyrell you know I'm wondering if these aren't the collected works of Shakespeare actually now that I'm reading this looking at all of the names that's kind of Brutus and King Richard that sounds like it's from a couple of different plays Caesar and whatever King Richard appears and I don't know again I'm an uncultured swine you let me know but you can see that what's really fascinating here is that this model started out with no information about the English language whatsoever it knew nothing at all about English we didn't tell it that there are words we didn't tell there are sentences we didn't tell it that you should add in breaks or periods or any other type of punctuation it knows nothing at all and within I don't know two and a half minutes of training it generates a model that can string together characters and words in a way that almost kind of makes sense now you know Bernadine says I am a Roman and by tenet and me now that is mostly gibber ish but I am a Roman certainly makes sense you know but warwick i poison that you have heard you know that is kind of something to add my own important process of that hung in point okay that's kind of silly is is pointing that my soul I love him well so it strings together words in a way that almost makes sense now returning back to the question of which is better a billion monkey hours of typing or this AI my money's solidly on the AI you know these aren't put together randomly these are put together probabilistically and they kind of sort of make sense and you can see how more sophisticated models like the open a text generator could be somewhat more sophisticated using transformer networks and how they can be better at actually creating text that even makes even more sense although what's interesting is that it's not you know a significant quote/unquote quantum leap I hate that phrase but it's not a quantum leap over what we've done here in just a few minutes on our own GPU in our own rooms that is quite cool and that is something that never ceases to amaze me so I hope you found this tutorial enjoyable if you have make sure to hit the subscribe and the bell icon because I know only 14% of you get my notifications we look forward to seeing you all in the next video
Info
Channel: freeCodeCamp.org
Views: 66,280
Rating: undefined out of 5
Keywords:
Id: B2q5cRJvqI8
Channel Id: undefined
Length: 95min 43sec (5743 seconds)
Published: Wed Jan 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.