PyTorch RNN Tutorial - Name Classification Using A Recurrent Neural Net

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys welcome to a new pie chart tutorial today we'll be talking about recurrent neural nets or short r and n's i will briefly explain the theory behind r ns and the different kind of applications and then we will implement a rnn from scratch in pi torch to do name classification this should give you a good understanding of how iron ends work internally so let's start rnn's are a class of neural networks that allow previous outputs to be used as inputs while having hidden states here's an image that shows the architecture of rns in the simplest way so we have an input and then internally do some operations and get hidden states and then we take those hidden states and put them back into the next step so we can use the previous knowledge to update our new state and then at the end we also get an output so we can also unfold this graph to get a better understanding and basically we are working with a sequence here so for example if we have a whole sentence then we might use every single word as one input so we have the first input and some initial hidden states and then we do our operations and get the output and a new hidden state and then we used this hidden state and then put it into the next input so we take the next input and use the previous hidden states and again do our operations and get a new output and a new updated hidden state and then we take the next input and so on so this is basically the architecture of our rnn and now why are are n so important and for this there is a nice article by andrei karpathy the article is called the unreason unreasonable effectiveness of recurrent neural networks and i highly recommend to give this a read so i will put the link in the description but the core takeaway is that rns are so exciting because they allow us to operate over sequences of vectors so with traditional neural nets we have just a one-to-one relationship for example when we do image classification then our input so our image is of fixed length and our also our output is a fixed length and now with r and ends we can work with sequences and there are a lot of different types so basically we can have a sequence in our input and we can also have a sequence in our output or also in both input and output so for example we can have a one-to-many relationship where we have only one input for example this is used in image captioning when we have one in image and then we want to describe what we see in the image and get multiple outputs then we can also have a many to one relationship so we have this is for example the case in sentiment classification or what we are doing later with our name classification so we have a sequence as inputs and then apply our rnn and then we use the last output and do some classification with this then we can also have a many-to-many relationship this is for example used in machine translation where we have a whole sentence for example in english as an input and then we put out a whole sentence in french for example then we can also have a many-to-many relationship with a synced way for example in video classification where we want to classify each single frame so yeah they these are our possible applications of rns so they are mostly used in the fields of natural language processing and speech recognition but they could for example also use for image classification so yeah that's what makes rnn so powerful and now let's have a brief look at some pros and cons so the advantages are that we have the possibility of processing inputs of any length then our model size is not increasing with the size of the input then the computation takes into account historical information so the previous data and our weights are shared across time and some drawbacks are that the computation might be slower than with normal neural nets and it can be difficult to access information from a long time ago and we are also not able to consider any future input for the current state so yeah that's basically the theory behind rns and now we can directly jump to the code so in our example we want to do name classification so i downloaded the data and i will also put the link in the description so basically we have different files with different names so these are all last names from different countries for example we have arabic chinese czech dutch english and so on so i think these are 18 different countries and now we what we want to do is we want to classify this and detect from which country the name is and what we're going to do here is we take the whole name as a sequence and then we use each single letter um and put it in our rnn as one input and for this we need some helper functions so i already implemented them here and i will only go briefly through this code so what we want to do here first we want to have a helper function to convert our data to ascii for example if we have this name with some special characters and then we process this so let's run this file um then we see it removed the special characters and we only have ascii characters left and basically we also print all the possible letters so this is from a to c and also in capital letters and we also allow these signs and then we have a helper function to load the data so this basically loads all those files and then it loads all the names and it gets the country from the file name so this is what the load data function will do and then we have some functions to turn our data to tensor so we have letter to index and letter to tensor and also a line to a tensor and what we are doing here is we are using a technique that is called one hot encoding so we need a way to display our data and that can be used for training and for this um we use one hot encoding so if you've watched my tutorial about the chat bot in pytorch then you might already know this so a one hot vector is filled with zeros except for a one at the index of the current letter for example if we have only five possible characters a b c d and e then our a would be a vector of length five and we have a one at the position where a is and the same for b a vector of length five and the second index is a a1 and the rest zeros so this is one hot encoding so if we go back to our file for example i can show you how the load data function looks so this will return a dictionary with the country as key and then the corresponding names as the values so for example if we run this and have a look at the key italian and only take the first five entries then we see we have five different italian names here then um as i said we do this one hot encoding so for this we can use the letter to tensor function so now um if we run this so let's save this and run this and then print the tenzo for j then we see it has this size so this is of shape 1 by 57 because we have 57 possible characters this is what i printed here these are all the letters and then we have a j at the position where the capital j is so this is how our input will look like like later and of course we do this not only for a single character but for the whole name so for this we have the line to tensor function and now if we here we print the size so this will be of size five by one by 57 and the one is because our model expects it to be in this shape and the five is because of the number of characters and the 57 is because of the number of all different characters all right so these are all the helper functions that we need and of course i will put the code on github and i also provide the link to the data so you can download these files so now we can start writing our rnn so for this of course we import the things we need so we import torch we import torch dot n n s n n then i also want to import map plot lib dot pi plot as p l t because i want to show you a plot later matte plot lip then we import our utility function so i say from utos import all the different letters and also the number of different letters so this is 57 then we also say from utils we want to import these helper functions so load data then letter to tensor and line to tensor and random training example so this is basically a function that will do a random choice from those um names and return the corresponding country so now that we have that we can start implementing our rnn so we need to have a class and call this rnn and this should inherit from nn.module as all of our pi torch models and by the way there is already a rnn module available in pytorch so so you can directly use this but this is what we are doing in the next tutorial so for now we want to implement this from scratch to get a better understanding so let's have a look at our model architecture again so this is what our um rnn for name classification will look like so we have an input and a hidden state and then internally so this is what we are doing here internally so we combine them and then we process our combined tensor and we apply two different hidden layers so we have our input to a output and our input to a hidden layer so these are just two normal linear layers and then we get one hidden output and then we use this for the next input and we also get a output and since we are doing classification so a multi-class classification task we apply the softmax layer and then get the output so now this is what we want to implement so now first of all of course we define our init function this will get self then it gets the input size it also gets the hidden size so this is going to be a hyper parameter that we can specify and we also get the um output size and in our init function first we want to call this super rnn and self and then the init sorry init then here we want to store our hidden size so we say self dot hidden size equals hidden size then we define our two different linear layers so we call this input to hidden i to h equals nn dot linear and the size is the input size plus the hidden size because we combine them and for this the output size is still the hidden size then we do the same with our output so we say input to output and this is going to be a linear layer as well so the input size is the same and here we use the output size and then we also need a softmax layer so we say self dot soft max equals nn dot lock soft max and then we say that dimension equals a long dimension one because our input for example just test the shape one by 57 so we need the second dimension so this is our init function then we we of course also have to define the forward pass so we say define the forward function and this gets self and then it gets an input tensor and as you should know now this also gets the hidden tensor so we use the hidden tensor for the forward pass and then here we process this so first thing let's have a look at this again we combine our input and our hidden tensor and then we apply those linear layers and the soft marks and then we return two different tensors so the output tensor and the new hidden tensor so let's do this so let's call this combined equals and for this we can use torch dot cat and then as a tuple we want to combine the input tensor and the hidden tensor and here again along dimension one then we apply our linear layers so we say hidden equal self dot i to h and then here we put in the com bind tensor then we say our output equals self dot i to o o with our combined tensor as well and for the output we also apply this soft mark so we say output equal self dot softmax with the output and at the end we return the output first and then the new hidden state and so this is basically all we need for our rnn implementation and i'm also going to add a new a little helper function and i call this init hidden so i need some initial hidden state in the beginning and what i want to do here simply is i want to return an empty tensor so i say torch dot zeros and this has the shape one by self dot hidden size and yeah so now we can start applying this so now let's load the data so let's say our category lines and our all possible categories equals load data so this is a dictionary with the country as key and the names as values and this is just a list of all the different countries and then the number of categories equals the length of all categories for example we can print the number of categories and let's save this for now and see if everything is working so let's say python r and n dot pi and now this is working so we see we have 18 different categories so this is because we have 18 different files here and now we need to define or set up our r and n so we say r and n equals and then we use our r and n class and this gets the input size which is the number of possible letters then it needs the hidden size so and hidden and it needs the output size and this is the number of categories and now our hidden size so n hidden is an hyper parameter that we can define so here let's try 128 and now as an example let's do one single step so we can for example say our input tensor equals and then we use the letter to tensor function for let's say for a and our then we need a hidden tensor so hidden tensor equals r and n dot init hidden and then we process this so we say output and next hidden equals r and n with the input tensor and the initial hidden tensor and now for example we could print the output size or shape and let's also print the next hidden shape or size and let's run this and see if this is working so yeah we see our n r and n applied the forward pass and we get the new output with this shape and a new hidden state with this shape so it's still the size of the defined hidden size so this is how it works for only one character and now if we go back so now basically we want to treat our name as one sequence and then each single character is one input so we repeatedly apply the r and ends for all the characters in the name and then at the very end we take the last output and apply the softmax and then take the one with the highest probability so this is what we want to do for one name so now we say we have the whole c queens or the whole name and here we say our input tensor equals and then we use line to tensor and here for example we use the name albert and then our hidden tensor is the same and also this call is the same so we grab this and copy this and then here we use slicing so we only use the very first letter now for this um simple example so let's try this and run it and see if this is working so yeah so this is working and now here what we have to do is we have to repeatedly apply these characters so for this let's um write some helper functions first so let me comment on the print statements out again so now let's define a function and let's call this category to category from output and this gets the output and as i said we applied the soft marks at the very end so this is basically a likelihood of each character of each category so we want to return the index of the greatest value so we can get this so category index equals and here we can use torch dot arc max and then here we put in the output and then we can call the dot item because this is only one value and then we can return all categories the um list with this index category index so this is one helper function for example now if we print the category from output and then here our output from this name then if we run it then we get now this is irish and of course this is not trained yet so this doesn't look like an irish name to me so now what we want to do is we want to train our rnn of course so here as always um we want to set up a criterion and a optimizer so we say criterion equals and here we use nn dot n l l loss this is the negative log likelihood loss then we need to specify a learning rate and here we have to be careful so in this case i try 0.005 and the learning rate is very important here so you might want to play around with this a little bit and then we say our optimizer equals torch dot optim dot sgd so stochastic gradient descent and then we want to optimize rnn dot per parameters and as a learning rate we use the defined learning rate so now we have our loss and criterion and let's define another helper function and call this training so this is going to be one training step and this gets a line tensor so the whole name as a tensor and it also gets the category category tensor so this is the actual class label or the index of the class label and now here we want to get a initial hidden state so we say hidden equals rnn dot init hidden and then as i said we want to do this repeatedly so we need a for loop so we say for i in range and then the length of the line tensor so we say line tensor dot size and then um index zero so this is the length of the name basically and then we apply this so we say output and hidden equals r and n with the line tensor of the current index so the current character and the previous hidden state so note that we put in the hidden state and then assign it also to the same variable so the new hidden state will be the output from the rnn and then we do this for the whole name and then for the very last character we get the final output and then we use this to calculate our loss so we say loss equals here we apply our criterion with the output and the category tensor and then as always we do our optimizer step so first we say optimizer zero gradients then we say loss dot backward and then we say optimizer dot step and then at the end of each training step let's return the output and let's also return the loss dot item so not as tensor but as float value and now we have this helper function for the training step and now we can do our typical typical training loop so for this let's track some things so let's say the current loss equals zero in the beginning then all losses equals an empty list so here we want to put in all the losses so we can plot them later then let's say our plot steps and also our print steps equals let's say one thousand and five thousand and the number of iterations equals let's say one hundred star thousand and then we say four i in range and iters and now what we want to do is we want to get a random training sample so we have this as a helper function and this returns a the category then it returns the actual line or the name then the category is tensor and also the line as tensor and we get this by calling the random training example function from the utility class and this needs the category lines as input and all categories and then we call the training functions so we say output and loss equals training and this gets the line tensor and it gets the category tensor then we add the loss to our current loss so we say current loss plus equals loss and then we want to print some information so we say if i plus one modulo plot steps so every thousand step if this equals zero so here we want to calculate the current running loss and appended to all losses so we say all losses dot append and here we say current loss divided by the number of plot steps and then we set our current loss back to zero because here we add it up for every iteration and then only every thousandth step we append it so we have to divide it by the number and then we get the average and now we do the same with the print steps so we say if i plus 1 modulo print steps equals equals zero then we want some it prints some information so first we want to get the guess so we say guess equals category from output and we put in the output of course then we check if this is correct so we say correct equals and then we say correct and this is correct if the guess equals equals the actual category that we get from the random training example and if this is not correct then we print wrong and we also want to print the actual category so let's use this as an f string here and here let's print the actual category and then we print as an f string again we want to print the current iteration step then we also let's print i divided by the number of errors times 100 then let's also print the current loss so we say loss and let's print only four decimal values and let's also print the current line so basically this is the name and then let's print the guess and let's also print the if it's correct or not and now we are done so this is basically all that we need and now when we are done let's plot our losses so let's create a figure with matplotlib so pldlt.figure and pl t dot plot and here we want to plot all the losses and then say plot show and now we could already start our training and now what we for example what we can do is we can save our model here and then use it later or for for whatever we want but in this case i simply want to try it myself so i say while true and then i say i get a sentence as input and [Music] let's say input and then i put this to a function or i first let's say if sentence equals equals um quit then i break and otherwise i want to predict the sentence so for this let's create another little helper function let's call this predict and this gets the in put line so as raw text so first of all i print let's print a new line and then let's print the input line as well so for this of course we need an f string and now here in our prediction we should use torch dot no graph so we can turn off the gradients now and then what we want to do is we want to say our line tensor equals line to tensor from the raw input line and now we want to do the same as we're doing in our training step so we have the initial hidden states and then um repeatedly apply our r and n and let's do this so let me copy this and put it here so we have the initial hidden state then we say for i in line tensor size zero so and then we get the new output and the new hidden state by applying the rnn and then at the very end we want to get the guess so we say guess equals and then we say category from output and we use the last output from the last step and then we simply want to print the guess here so in this example i don't calculate the accuracy or anything i just print the gas and see if it's correct or not so yeah so let's save this and let's run this and hope that everything is working so this might take a few seconds or minutes so this is from the first example that i showed you and now we see we have step uh this should be 5000 actually and this should be five percent of the training is done and then we have the loss and we have the name and we see that the guess is wrong because this is actually polish all right so now it's done and it's blotting the losses and you see that it's decreasing very quickly and then it's trying or jumping around a little bit but still decreasing i think this is a pretty good result so let's have a look at some random guesses during the training so we see that in the beginning almost every guess is wrong and then it's starting to learn something so now it's starting to do correct predictions but of course it's still not perfect so there are still wrong corrections but yeah at the very end we have this loss so this is pretty good and now we can try it ourselves so now for example we could go and um try some names from those files so for example let's start with some german names so let's try akker and we see it says german so let's try adler for example and this is also correct so let's try some italian names i think these are pretty clear to the text so let's try abba delhi and it says italian so this is correct so let's try some russian for example let's say a baimov and it says russian yeah great so let's try something more difficult for example chinese so let's try bao and it says correct bow let's try by still chinese so it looks like it's it's working pretty nice in this example of course it's still not per perfect but for now all the guesses are correct so now you see how we can train a rnn to do name classification and yeah i hope you enjoyed this tutorial and now know how r ends can be implemented in pytorch if you like this tutorial then please consider subscribing to the channel and leave me a like and see you next time bye

Info

Channel: Patrick Loeber

Views: 82,644

Rating: undefined out of 5

Keywords: Python, Machine Learning, ML, PyTorch, Deep Learning, DL, Python DL Tutorial, PyTorch Tutorial, RNN, Recurrent Neural Net, NLP

Id: WEV61GmmPrk

Channel Id: undefined

Length: 38min 56sec (2336 seconds)

Published: Mon Aug 31 2020