Python TensorFlow for Machine Learning – Neural Network Text Classification Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

this is my machine and for some reason it's learning what's up code squad if you're new here welcome to the squad and if not welcome back my name is kylie ying and today i'm going to be talking about machine learning more specifically we'll dive into supervised learning and then we'll learn how to use tensorflow to create a neural net and then use a neural net for some text classification sound exciting let's get started all right jumping straight into things the first thing that we're going to do is go to colab.research.google.com and it'll pull up a page like this you're going to click new notebook this is going to create a new notebook and let's retitle this to free code camp tutorial okay it doesn't really matter you know what you actually rename it to as long as you have a notebook open and just in case you know you have some future notebooks that's why you want to rename all of them so the typical libraries that i'll import when i usually start a computer data science project are numpy so import numpy as np pandas import pandas as pd and then import matplotlib okay so because i'm going to be using tensorflow in this video i'm also going to import tensorflow as tf and then import tensorflow hub as hub okay now if you click shift enter it'll run this cell another thing that you can do is click this little play button over here that will also run the cell so cool yeah if we click that that runs the cell as well all right so the first thing that we're going to need to do before we can actually do any data analysis is upload a data file so this little folder over here this is where we manage our data or sorry this is where we manage our files and i'm just going to drag and drop the csv file the link is in the description below into here so click ok and you'll see that this is currently being uploaded sweet might take a while because this is a pretty big data set we can wait until this yellow thing goes all the way around all right so pause all right sweet now wine reviews.csv has uploaded to this folder which means that we have access to it in this notebook now so now what we can actually do is import that csv file as a data frame using pandas which is this library here that we imported so what i'm going to do is say df equals pd.readcsv so this is just a command that lets us read a csv file and i'm going to type in winereviews.csv and then also i'm going to use certain columns so if we actually took a look at this uh data frame here we can call df.head we see that we have this like unnamed column over here that contains a bunch of like indices and we don't really want that so what i'm going to do is say use columns give a list of the columns that i want so i want maybe the country the description [Music] the points the price and you know i don't really care about some of these things like the twitter handle i think the variety might be cool to check out maybe the winery all right so let's run that and then now take another look at the data set so now in our data set we have the country the description the points the price the variety and the winery something that i think would be really cool is to see if we can try to guess or ballpark you know whether something falls on the lower end of the point spectrum or the higher end of the point spectrum given the description that we have here so the first thing that we do see also is that we have a few none type values in our data frame that's what this here stands for it means there's no value recorded there so let's just focus on these two columns actually the description and the points because i think that's what we'll try to like align we're gonna use a description to predict the points something that we can do is a command called drop nah don't know if that's you know the right way to say it but in my head that's what i say and we can say subset which means that you know in a subset of these columns that's where we're going to try to drop the pr like the the nand column so here i'm going gonna say description and then points run that okay so don't even know if this has changed okay i mean we still see this because we didn't drop anything in that column because it doesn't really matter to us let's just quickly see let's plot like the points column to see the distribution of the points so we can use matplotlib for that so let's do plt.hist so this is going to be a histogram which shows the distribution of values um because it's only a one-dimensional variable so let's do df dot points which calls like this points column and let's just say bins equals 20. now if we do plt.show this should display our plot all right i didn't include the title and the axes because this is kind of just for us to quickly look at it if you were to actually plot you know this as some sort if you were to if you wanted to actually plot this for other people to view you might want to say plt.title is know points histogram and the plt label the y label so that label for the y axis would be you know n the number of values that lie in each bin and then the x label i would say uh it would be the point so if we see something like this tada there is our plot this is our distribution of points so we see that it seems like it's on a range from 80 to 100 which means that let's try to classify these reviews as below 90 and then above 90. so we're splitting this into two different categories low which is over here and high which is over here now before we proceed with the rest of this tutorial we're going to learn a little bit about machine learning because you can't really just dive into the code without understanding what's going on or at least having you know a vague sense of what's going on which is what i'm going to try to teach in this video so let's hop over to some more theoretical aspects of machine learning so first let's talk about what is machine learning well machine learning is a subdomain of computer science that focuses on algorithms which help a computer learn from data without explicit programming for example let's say i had a bunch of sports articles and a bunch of recipes explicit programming would be if i told the computer hey look for these specific words such as goal or player or ball in this text and if it has any of those words then it's a sports article on the other hand if it has flour sugar oil eggs then it's a recipe that would be explicit programming but in machine learning what the goal is i instead provide the computer with some sort of algorithm for the computer to be able to decide for itself hey these are words associated with the sports article and these are words associated with a recipe sound cool it is so stay tuned now these days we've heard a lot of words kind of you know being thrown out there such as artificial intelligence machine learning data science cloud blockchain crypto et cetera et cetera et cetera now we won't talk about the cloud or crypto or blockchain but let's kind of talk about ai versus ml versus data science and what the difference between all of these is so artificial intelligence is an area of computer science where the goal is to actually enable computers and machines to perform human-like tasks and to simulate human behavior now machine learning is a subset of ai that tries to solve a specific problem and make predictions using data now data science is a field that actually attempts to find patterns and draw insights from the data and you know data scientists might actually use some sort of machine learning techniques while they're doing this and the kind of common theme is that all of these overlap a little bit and all of them might use machine learning so we'll be focusing on machine learning there are a few different types of machine learning so the first one is supervised learning which uses labeled inputs meaning that the input has a corresponding output label to train models and to learn outputs so for example let's say i have these pictures of some animals so we have a cat a dog and a lizard well in supervised learning we would also have access to these labels so we would know that this picture is associated with a cat this picture is associated with a dog and this picture is associated with a lizard and now because we have all of these input output pairings we can stick this data into a model and hope that the model is able to then generalize to other future pictures of cats or dogs or lizards and correctly classify them now there's also such thing as unsupervised learning and in this case it uses unlabeled data in order to learn certain patterns that might be hiding inside the data so let's go back to our pictures of our animals and now we might have multiple pictures of cats multiple pictures of dogs multiple pictures of lizards and also just a quick note that we would also have these in supervised learning but all of these would have the cat label the dog label and the lizard label associated with them but okay now going back to supervised learning we have all these pictures and what our algorithm is going to want to do it wants to learn hey these are all something you know of group a because they're all similar in some way these are all group b and these are all group c and it basically tries to learn this inherent structure or pattern within the things that you know we're feeding it finally there's reinforcement learning so in reinforcement learning there's an agent that's learning in an interactive environment and it's learning based off of rewards and penalties so let's think about a pet for example and every single time our pet does something that we want it to so for example some sort of trick we give it a treat such as in this picture now if you know if our pet does something that we don't want it to for example pee on our flowers then we might scold the pet and the pet would then like the pet would then start learning okay you know it's good when i do this trick and it's bad when i pee on the flowers this is kind of what reinforcement learning is except instead of your pet it's a computer or i guess an agent that's being stimulated by your computer now in this specific video we're just going to be focusing on supervised learning so that's using these labeled input and output pairings in order to make future predictions okay so let's talk about supervised learning so this is kind of what our machine learning model is we have a series of inputs that we're feeding into some model and then this model is generating some sort of output or prediction and the coolest part is that this model we're as we as programmers are not really telling this model any specifics we're not explicitly programming anything rather this model our computer is trying to learn patterns amongst you know these input this input data in order to come up with this prediction so a list of inputs such as the ones here this is what we call a feature vector we'll talk about that in some more detail later so let's quickly talk about the different types of features or inputs that we might be able to feed our model so the first type is qualitative data and this means it's categorical which means that there are finite numbers of categories or groups so one common example is actually gender and i know that this might seem a little bit outdated but please bear with me because i just want to get the point of a qualitative feature across so here in this picture we see that there is a girl and a boy so let's take these two different groups first you might notice that there's not exactly a number associated with being a girl or being a boy so that's the nature of qualitative data if it doesn't have some sort of number associated with it it's probably qualitative now let's take a look over here there's different types of flags like maybe these represent you know your nationality might be a qualitative feature qualitative features don't have to necessarily be exclusive but they just don't have a number associated with it and they belong in groups so you might have us you might have canada you might have mexico et cetera et cetera these two specific qualitative features are known as nominal data in other words they don't have any inherent ordering in it now our computers don't really understand like labels or english too well right our computers are really really good at understanding numbers so how in the world do we convey this in numbers well we use something called one hot encoding so suppose we have you know a vector that represents these four different nationalities usa india canada and france what we're going to do is we're going to market with a 1 if that category applies to you and 0 if it doesn't so for somebody who has us nationality your vector might look like 1 0 0 0. for india it might be 0 one zero zero canada zero zero one zero and france zero zero zero one so that's one hot encoding it turns these different groups into a vector and i guess switches on that category with a one if that category applies and zero if it doesn't now there are also other types of qualitative features so something like age even though a number might be associated with it if we take different groupings of age so for example baby kid gen z young adult boomer etc etc if we take these different categories then this actually becomes a qualitative data set because you can assign one of these categories to somebody and it doesn't necessarily map to a specific number another example of categorical data might be a rating system of bad to good and this is what we call ordinal data so even though it's qualitative it has some sort of inherent ordering to it so hence the name ordinal data now in order to encode this into numbers we might just use a system like one two three four five quick note the reason why we use one hot encoding if for our nationalities but we can use one two three four five for ordinal data is because let's think about things this way our computer knows that two is closer to one than five right and in a case like this it makes sense because two is slightly less worse than one whereas five is actually really good so of course two should be closer to one than five but if we go back to our nationality example it doesn't really make sense to say you know to rate usa one india to canada three and france four because we could also switch around these labels and they would still be distinct groups and they're just different it's not like one of them is closer to the other than something else they're just different i mean i guess it depends on the context but if we're talking nationality they're just different right so you can't necessarily say two which i think was i assigned to india is closer to one usa than france which is four like a computer would think that but just thinking about it logically that wouldn't really make sense so that's the real difference between these two types of qualitative data sets is how you want to encode them for your computer when we're talking about features we also have quantitative features and quantitative features are numerical valued inputs and so this could be discrete and it could also be continuous so for example if i wanted to measure the size of my desk that would be a quantitative variable if i wanted to tell how hot you know what the temperature of this fire was that's also another quantitative variable another type of quantity and these two are both continuous variables now let's say that i'm hunting for easter eggs and this is how many easter eggs i collect in my basket it probably doesn't make too much sense to say you know i have 7.5 but rather i have seven that that would make sense or eight you know somebody else might have two which means that i won but you know aside from that this is something that would be a discrete quantitative variable because we do have it's it's not continuous right there are discrete values integers positive integers that would be able to describe this data set over here this is continuous and over here this is discrete those features those are the different types of inputs that we might be able to feed into our model what about the different types of predictions that we can actually make with the model so there are a few different tasks that you know we have in supervised learning so there's classification which means that we're predicting discrete classes for example let's say we have a bunch of pictures of food so you know here we have a hot dog we have a pizza and we have an ice cream cone an example of classification might be okay well this gets mapped to a hot dog label and this gets mapped to pizza and this gets mapped to ice cream and if we have any additional photos of one of these we want to map them to one of these three classes this is known as multi-class classification because we have a bunch of different classes that we're trying to map it to so hence the name multi-class now what if instead of hot dog pizza and ice cream we had another model that just told us whether or not something was a hot dog so this over here is a hot dog and these over here are simply not hot dogs well this is known as binary classification because there's only two hence binary it's ready please god what would you say if i told you there is an app on the mind we're past that part just demo it okay let's start with a hot dog oh my beautiful little adriatic friend i'm going to buy you the palapa of your life we will have 12 posts braided palm leaves you'll never feel exposed again i'm gonna be rich you guilfoyle do pizza let's do pizza yeah hey zach not hot dog wait what the huh that's that's it it only does hot dogs no and a nah hot dog now let's talk about some other examples of classification to kind of really drill this down for you other types of binary classification might be positive or negative so if we have restaurant reviews positive or negative two different categories something else might be pictures of cats versus pictures of dogs cool cats and dogs and then maybe we have a bunch of emails and we're trying to create a spam filter one another example of binary classification might be spam or not spam now what about multi-class classification so going back to our first example of the cats and dogs well we also had a lizard you know you might also have a dolphin so different types of animals that might be something that falls under multi-class classification another example might be different types of fruits so orange apple and pear another example might be different types of plant species but here basically you have different types of classes and you have multiple of them more than two all right there's also something known as regression and in regression what we're trying to do is predict continuous values so one example of regression might be you know this is the price of ethereum and we want to predict what the price will be at tomorrow well there are so many different values that we can predict for that and they don't necessarily fall under classes like classes just doesn't intuitively make sense instead it's just a number right it's a continuous number so that's an example of regression or it might be what is the temperature going to be tomorrow that's another example of regression might be what is the value of this house given you know how many stores it has how many garages it has what is its zip code et cetera et cetera okay so now that we've talked about our inputs now that we talked about that now that we've talked about our inputs and now that we've talked about our outputs that's pretty that's pretty much you know machine learning in a nutshell except for the model so let's talk about the model okay so before i dive into specifics about a model let's briefly discuss how do we actually make this model learn and how can we tell whether or not it's actually learning we'll actually use this data set in a real example but here let's just briefly talk about what this represents so this data set comes from a certain group of people and this outcome is whether or not they have diabetes and now all these other numbers over here these are metrics of you know how many pregnancies they've had what their glucose numbers are like what their blood pressure is like and so on so we can see that all of these are actually quantitative variables you know they might be discrete or they might be continuous but they are quantitative variables okay so each row in this data set represents a different sample in the data so in other words each row represents one person in our data set now each column is a different feature that we can feed into our data set and by feature i just i literally just mean like the different columns so this one here is blood pressure this one here is number of pregnancies this one here is insulin numbers and so on except for this one over here this one is actually the output label that we want our model to be able to recognize now these values that we actually plug into the model again this is what we would call our feature vector and this this is the target for that feature vector so this is the output that we were trying to predict this over here this is known as the features matrix which we call big x and all these outcomes together we call this the labels or the targets vector y let's kind of abstract this to a piece of chocolate or a chocolate bar like this and you know we have all the numbers that represent our x matrix over here and the values our outputs y our target sorry over here now each of these features so this this is our feature vector we're plugging this into the model the model is making some sort of prediction and then we're actually comparing this to our target in our data set and then whatever difference here we use that for training because hey if we're really far off we can tell our model and be like hey can you make this closer and if we're really close then we tell our model hey keep doing that that's really good okay so this is our entire bar of chocolate so let's say this and this bar here represents all the data that we have access to do we want to feed this entire thing into our model and use that to train our model i mean you might think okay the more data the better right like if i'm on if i'm trying to look for a restaurant i'd rather have a thousand reviews than 10. but when we're using but when we're building a model we don't want to use all of our data in fact we want to split up our data because we want some sort of metric to see how our model will generalize so we split up our chocolate bar into the training data set the validation data set and the testing data set our training data set is what we're going to feed into our model and you know this might give us an output we again we check it against the real output and we find something called the loss which we'll talk about in a second but you can think of the loss as a measure of how far off we are so how far off are we put that into a number value and then feed that back into the model and that's where we're making adjustments this process is called training now we also have this validation set so this validation set we also feed into the model and then we can actually assess the loss on this validation set because again we have the real answer and we have this prediction and we can see how far off we are but this validation set is actually used as kind of more of a reality check during or after the training to ensure that our model can handle unseen data because remember up until this point our model is only being trained with our training set data okay so for example if i have a bunch of different models and all of these are my validation data sets and these are the predictions well okay this loss over here is kind of high this one's a little bit closer but look this one is the lowest we want to actually decrease the difference between our prediction and our true target and so another use case for the validation data set is to actually say okay well model c seems to perform the best on unseen data so let's take model c now once we've selected model c then we can actually go back and use our test set which again is unseen data and we plug that into model c see how it performs and then we can use that metric compared to you know our our targets as a final reported performance this test set is used to kind of check how generalizable the final model is okay so i kind of touched on you know something called a loss function but what exactly does that mean and what exactly how do we how do we quantify how different things are well this would probably give us a higher loss than this right like we would we would want that too because it's a little bit further off and something like that should be a lot further off which means that our loss function the value output from the loss function should be a lot higher than the previous two that we just saw okay so there are a few different types of loss functions so let's put our mindset in like in terms of regression for a second so we're trying to predict a value that's on a continuous scale so this might be uh the temperature tomorrow right now if we have a bunch of different cities that we're trying to predict then we have a bunch of different points right so this here y real this is the actual value that we found in our data set and why predicted this is a value that our model has output so what we're trying to do is we're trying to find the difference between these two values and then use the absolute value of that and then add all of these up you know for every single point in our data set so all the different cities in order to calculate the loss so in other words what we're doing is we're literally just comparing hey for every single city how different is our predicted value and the real value and then sum up all of those values so as you can see you know this is basically just an absolute value function so that's what l1 loss is now if we're really close if our predictions are really good then you can see how this loss becomes really small and if our values are really far off well that becomes pretty large right there's also another type of loss called l2 loss which is the same idea but instead of using the absolute value function we square everything so this is also known as mean squared error which you might have heard of basically here instead of summing up all the differences we actually square all the differences and then we sum those up so again this is what a quadratic formula looks like so this is what the squares would look like and again as you can see if we're only off by a tiny bit our loss is really small which means that it's good and if we're off by a lot then our loss gets really really big really fast okay now let's think about the classification mindset when we're trying to predict let's say just two different classes so binary classification well your output is actually a probability value which is associated with how likely it is to be of class a so if it's closer to one then class a seems to be more likely and if it's closer to zero then it's probably class b so in binary cross entropy loss what happens is you're taking the real value times the log of the predicted value and then adding that with 1 minus the real value times the log of 1 minus the predicted value summing that up and using a normalization factor you don't really have to know this too well this is a little bit you know more involved mathematically but you just need to know that loss decreases as the performance gets better so one of the metrics of performance that we can talk about in classification specifically is accuracy so let's say that we have a bunch of pictures here and we want to predict their labels so here i also have the actual values so of course this is an apple this is orange apple apple etc and use your imagination think that you know these two are slightly different from the original well let's say that our model is predicting apple orange orange apple so you know we got this right we got this right we got this wrong and we got this right so the accuracy of this model is 3 out of 4 or 75 percent if you just think about it in english that makes sense right like how accurate our model is is how many predictions it correctly classifies up until now we've talked about what goes into our model the features what comes out of our model you know what type of prediction it is whether we're doing classification or regression but we haven't really talked about the model itself so let's start talking about the model and that brings me to neural nets okay so the reason why i'm going to cover neural nets is because they're very popular and they can also be used for classification and regression now something that i do have to mention though is that neural nets have become sometimes a little bit too popular and they are being sometimes maybe overused there are a lot of cases where you don't need to use a neural net and if you do use a neural net it's kind of like using a sledgehammer to crack an egg it's a little bit you know too much there are plenty of other models that can also do classification and regression and sometimes the simpler the model the better and the reason for that is because you don't want something that's so good at predicting your training data set that you don't that you know it's it's not good at generalizing and often the thing with neural nets is that they are very much a black box the people who create the neural nets don't really know what's going on inside the network itself when you look at some of these other models when you look at other types of models in machine learning oftentimes those might be a little bit more transparent than a neural net with a neural net you just have this network with a ton of parameters and sometimes you can't really explain why certain parameters are higher than others and so you just the whole question behind why is a little bit lacking sometimes but with that being said let that be your warning we're going to talk about neural nets anyways because they are a great tool for classification and for regression all right let's get started so as i mentioned you know there are a ton of different machine learning models this one here is called the random forest this one here could just be classic linear regression this one is called a support vector machine and these different types of models they have their pros and cons but we're going to be talking about neural networks and this is kind of what a neural net looks like actually this is exactly what a neural net looks like you have your inputs they go towards some layer of neurons and then you have some sort of output but let's take a closer look at one of these neurons and see exactly what's going on okay so as i just mentioned you have all of your inputs these are our features remember how we talked about feature vectors so this would be a feature vector with n different features now each of these values remember because we our computer really likes values each of these values is multiplied by a weight so that's the first thing that happens you multiply your input by some weight and then all of these weights go into a neuron and this neuron basically just sums up all these weights times the input values and then you add a little bias to it so this is just some number that you add in addition to the sum product of all of these and then the output of the sum of all of these plus the bias goes into an activation function and an activation function we'll dive into that a little bit later but you can think of it as just some function that will take this output and alter it a little bit before we pass it on and it could be the output but this is the output of a single neuron over here okay and then once you have a bunch of these neurons all together they form a neural network which kind of looks something like this this is just a cool picture that i found on wikipedia all right so let's take a step backwards and talk about this activation function because i just kind of glossed over it and i didn't really tell you exactly what it is so this is what another this is another example of a neural net this is what a neural net would look like you have your inputs okay they go into these layers and then you have another layer and then you have your output layer so the reason why we have a non-linear activation function is because if the output of all of these are linear then the input you know into this this would also just be a sum of some weights plus a bias what we could do is essentially propagate these weights into here and this entire network would just be a linear regression i'm not going to do the math here because you know it involves a little bit of some algebra but if that's something that you're interested in proving it would be a really good exercise to prove that if we don't have a non-linear activation function then a neural network just becomes a linear function which which is bad like that's that's what we're trying to avoid with the neural net otherwise we would literally just use a linear function without activation functions this becomes a linear model okay so these are the kinds of activation functions that i'm talking about there are more than these but these are three very common ones so here this is a sigmoid activation function so it goes from zero to one tange which goes from negative one to one and then relu which is probably one of the most popular ones um but basically what happens here is if a value is greater than zero then it's just that value and if it's less than zero then it becomes zero so basically what happens in your neural net is after each neuron calculates a value it gets altered by one of these functions so it basically gets projected into a zero or one a negative one or a one and then in this case zero or whatever the output is and then it goes on to the next neuron and on and on and on until you finally reach the output so that's the point of an activation function when you put it at the very end of a neuron it makes the output of the neuron non-linear and this actually allows the training to happen we'll talk about that also in a second we've seen this picture before how we have the training set that goes into our model and then we calculate some loss and then we make an adjustment which is called training so let's talk about this training process now okay so this is what our l2 loss function looks like if you can recall from a few minutes ago basically it's a quadratic function and when your real and predicted values are further apart then the difference becomes the diff the square of the difference becomes very large right and when they're close together then you minimize your loss and all is good in the world okay up here the error is really large and we want to decrease the loss right like the the smaller the loss the better our model is performing in some ways like loss is just a metric to assess how well our model is performing so our goal is to get somewhere down here and now up here this is the part that might involve a little bit of calculus understanding but because not everybody out there knows calculus i'm going to skip the numbers and just use diagrams so if we're up here this is the opposite of the slope right like the slope here i mean it's increasing it would it would be positive but we want to take if we want to get down here we want to go in the opposite direction as that the higher up we go the more steep we want to step right because the further away we are from our goal whereas maybe down here we want to take a baby step over here because we don't want to overshoot we don't want to you know pass this and never be able to find it so we use something called gradient descent in this case and gradient descent is basically taking it's measuring to some extent the slope at a given point and it's taking a step in the direction that will help us this is where back propagation comes in and back propagation is the reason why neural nets work so if we take a look at this l2 loss function okay you might think yeah like this depends on what our y values are right like what is our predicted value what is our real value okay well our real value is staying the same our predicted value is a function of all the weights that we just talked about and all the like inputs right but our inputs are also kind of staying the same so as we adjust the weight values then we are actually altering this loss function to some extent which means that we can calculate the gradient the slope of our loss function with respect to the weights okay got that so if we're looking at the various weights in our diagram we might calculate a slope that's you know with respect to that weight and each of them might be a slightly different value as we can see here so what we're going to do with gradient descent slash back propagation is we're actually going to set a new weight for that parameter and that value is going to be the old weight plus some alpha and we'll get back to that but just think of this as some variable multiplied by this value going down this way a quick side note if you're studying machine learning some more this might be a minus sign and this would be the gradient but for all purposes right now because we're using these arrows instead it's more intuitive to just add something in this direction if that confused you you can ignore it until you start getting into the math of back propagation but what i'm trying to say is that essentially calculating this gradient with respect to one of the weights in the neural network allows us to take a step in that direction in order to create a new weight and this value alpha here this is called our learning rate because we don't want to take this massive step every single time because then we're making a huge change to our neural network instead we want to take baby steps and if every single baby step is telling us to go in one direction then okay fine we're going that direction but taking small steps is better than you know overshooting and diverging off into the land of infinities yeah um you can think of this as for example if you were tailoring something it's better to remove like bits and bits of the fabric rather than removing an entire chunk and realizing oh my gosh i just took off way too much so that's what the learning rate is for it's so that we don't you know take off a huge chunk of the fabric instead we go bit by bit by bit okay so then going back to all these other weights we can see how each weight in the neural net is getting updated with respect to what this gradient value is telling us so basically this is how training happens in a neural net we calculate out the loss okay we see there's a massive loss we can calculate the gradient of that loss function with respect to each of the weight parameters in the neural network and now this allows us to have some direction some measure of the direction that we want to travel in for that weight whether we want to increase the weight or decrease the weight we know based on this gradient that you know we're finding so that is how back propagation works and that's exactly what's happening in this step right here that is our crash course on neural networks it is not the most comprehensive crash course out there if you are interested in neural networks i do recommend diving in deeper into the mathematics of it which i'm not going to cover in this class because not everybody has had the mathematical prerequisites but again if that's something you're interested definitely go and check it out let's move on to talk about how we would actually implement this neural net in code if we wanted to create a neural net so this is where machine learning libraries come in okay so in machine learning we often need to implement a model we probably always need to implement a model and if we want our model to be a neural net like this okay that's great we could go and we could code you know each neuron or we could code a neuron class and we could stitch them all together but we don't really want to start from scratch that would be a lot of work when we could be using that time to fine-tune our network itself so instead we want to use libraries that have already been developed and optimized to help us train these models so if we use a library called tensorflow then our neural net could look something just like this and that's a lot easier than trying to go through and you know develop and optimize the network entirely from scratch ourselves so straight from the website tensorflow is an open source library that helps you develop and train ml models okay great that sounds like exactly what we want so tensorflow is a library that's comprised of many modules that you know you might be able to see here but for example we might have this data module and here you know we have a bunch of tools that help us import data and keep data consistent and usable with the models that we will create another great part of this api is keras so here we actually have a bunch of the different modules which will help us you know create models help us optimize them and these are different types of optimizers et cetera so basically the whole point of this is that you know we want to use these packages we want to use these libraries because they help us maximize our efficiency where we can focus on training our models and they're also just you know really good and like fine-tuned already so why would we want to waste our time to code stuff from scratch all right so now let's move over to our notebook and i'm going to show you guys how to actually implement a neural net using tensorflow just you know straightforward feed forward no pun intended neural network now that we've learned a little bit about neural nets about tensorflow let's try to actually implement a neural net using tensorflow so let me go back to this collab tab and let's actually create a new notebook okay and this notebook again i'm going to call this maybe just a feed forward nullnet example all right now i'm going to again use these same imports run this great it's running okay so now the second thing that i'm going to have to import in here is my data set and so here i have a data set called diabetes.csv which is also in the description below click ok there and that was a that was a substantially smaller data set than the one that we tried to import at the beginning which don't worry we will get back to at the very end of this video but that's why it took significantly shorter to upload all right so this data set that we're using here in diabetes.csv this is a data set that was originally from the national institute of diabetes and digestive and kidney diseases and in this data set all these patients are females at least 21 years old of pima indian heritage and this data set has a bunch of different features but the final feature is whether or not the patient has diabetes okay let's take a look at this data set so the first thing that we can do is once again create a data frame so we're going to use read csv again and here we can just say diabetes.csv let's see what this looks like all right so we can see that each of these is one patient and how many pregnancies they've had their glucose blood pressure skin thickness etc a bunch of other measurements okay so it's always a good idea to do some sort of data visualization on these values in order to see if any of these you know have some sort of correlation to the outcome and there are many different metrics that you can run you can try to literally find the correlation between pregnancies and the outcome but for this purpose i think it's a lot easier to visualize things in i guess a visual way how else would you visualize things so here what i'm going to do is i'm going to plot each of these as a histogram so each value in the feature as a histogram and then compare it to the outcome so let's try to do that using a for loop so for i in range and here i'm going to use the columns of the data frame um up until the very last one because that's the outcome that's the one that we're actually comparing against what i'm going to do is say the label is equal to dataframe.columns i and actually let me make this screen larger for you guys because i know that some people yeah okay hopefully you can see this a little bit better so anyways down here uh for i in this range okay how about this this is better okay so for each basically this for loop here is trying to run through all of these columns except for the last one and the label is basically just getting the data frame column at that index so to show you guys what that actually looks like let's run df.columns so again this is just it's similar to going through a list of these items okay so the label we're you know indexing into something in this list and what we're going to do is plot the histogram so we can index into a data frame by calling the data frame and then saying you know where the data frame outcome is equal to one which means that they do have diabetes and then so this basically creates another let me show you this basically is a data frame where all the outcomes are equal to one okay cool so this is our new data frame where all the outcomes are one which means that all these patients are diabetes positive and then we're just going to index into the label that we have right here so we're just indexing it to the column and what i'm actually going to do is the same thing now but instead of one make this zero so now this here let me show this to you guys again this here is a data frame where all the outcomes are zero so that means that everybody here is um so that means everybody here is diabetes negative okay perfect well we want to tell the difference between the two so i'm also going to assign these colors so here let's use blue and red and then of course a label so this is no diabetes and this up here diabetes all right so plt now let's give this a title let's just use you know the name of the label and then the y label is n and the x label is the label if we call plt.legend like this then this actually shows us the legend including these labels and at the very end we call plt.show in order to see the plot so let's run this all right basically for each of the values we are plotting the measurements here and it's kind of hard to see you know diabetes versus no diabetes so another trick that we can do here is we can say alpha equals 0.7 if we run these again you'll see that it makes them a little bit easier to see behind one another okay so something else that we might want if we take a look at the number of values so if we say the length of this versus the length of this so this is saying how many patients are diabetes positive and how many are diabetes negative we see that we actually have two different values one of them is only 268 positive patients and then 500 negative patients so what we actually want to do is normalize this which means that we want to compare these histogram distributions to how many actual values there are in that data set otherwise you know you can clearly see that there are more no diabetes patients here than diabetes patients and this isn't really a fair measurement so i'm going to say density here equals true and just for kicks i'm going to say the number of bins that we're going to use in our histogram is 15. then here we would say this is probability because we're normalizing now using the density which means basically that we're just taking each of these values and dividing it by how many total values there are in the data set so instead it's a ratio for each column rather than just a straightforward number okay click enter again and now here we have more of a visualization so it does seem like you know people who have diabetes might have more pregnancies or people you know higher glucose levels that makes sense right seems like maybe slightly higher blood pressure but that's pretty inconclusive maybe skin thickness a little bit but also you know insulin a little bit inconclusive it does seem like perhaps people who have diabetes have a slightly higher bmi but so on so you can see that like there's no these values aren't separable which means that we can't really tell based on one value whether or not somebody has diabetes or not and this is where the power of machine learning comes into play is that we can actually learn whether or not or predict whether or not somebody has diabetes or not based on all of these features all together okay so now that we have this what we're going to do is split this into our x and y values so recall that the x is a matrix right so here i'm going to say it's just going to be all the columns up until the last value got values so let's run that and let's see oops and what does this give us this gives us now an array this is now a numpy array okay and we're going to do the same thing with y except y is just a single column so we can do this and i missed the s again okay so if we see why it's a huge array but it's one dimensional as you can see and it's just all the different labels in the data set so now we have our x and our y values we can actually just go ahead and split this up into our test and our our training in our test data sets so something that i'm going to import is from sklearn so this package this function here allows us to split arrays or matrices into random train and test subsets which you know is very useful so all we have to do is plug in our arrays and it should help us you know we would dictate the test size and the train size and then it would help us split them up so what i'm going to do is say x train and x i'm going to call this temp y train and y temp equal oh we have to import this so actually i'm going to come back up here and say from sklearn.model selection import this okay so make sure you rerun that cell but now we have access to that function so here train test split and i'm going to pass in my x and my y arrays and then i'm going to say the test size is equal to let's use 60 of our data for training and 20 for validation and twenty percent for test so what i'm going to do first is just split this up into zero point four and i'm just going to pass in random state equals zero so this allows us to get the same split every single time now the reason why i want to do this again but use 0.5 is now with this temporary data set which is technically you know the test set we're going to split this now into the validation and the test so let's do x valid test and instead of x and y we're going to pass in the temp that we just created up here so this is essentially x and y but only 40 of that data set and now we're breaking this down further into 50 50 which is going to be our test our validation and test data sets so let's run this cool checks out and let's build our model so here i'm going to say model equals tf which is tensorflow keras which is part of tensorflow that helps us write you know some neural nets and stuff like that called tf dot keras so let's check that out okay tf keras so basically it's this uh api that allows us to easily build some neural net models and here we're going to call the sequential so let's see basically it groups a linear stack of layers into a model which is exactly what we want because our neural net is uh because our neural net is exactly a stack of layers so let's pass in some fears here okay i think how i'm going to architect this is i'm just going to use a very simple model so keras dot layers dot dense 16 okay so basically what this actually let me activation equals okay what this is like setting up here is this is a layer of dense neural nets what does dense mean it just means that it takes input from everything and it outputs a value but densely connected is just you know okay it's a layer that's deeply connected with its preceding layer so it just means that every single like neuron here is receiving input from every single neuron that it sees from the past all right going back to here um this just means that this is a layer of 16 neurons that are densely connected so if we go back to here it means that this layer here is comprised of 16 different neurons and then our activation function is relu which is one of the activation functions that we saw previously where if x is less than zero then this becomes zero i guess less than or equal to zero this becomes zero and if x is greater than zero then this becomes just x okay and you know i'm just going to add another layer in there just for kicks and then finally we're going to conclude this with a layer of only one node but instead this layer is only going to have one node and the activation here is going to be sigmoid and what that helps us do is this is this is where the binary classification comes in because sigmoid if you can remember maps to zero or one so the the point of this activation function is that it maps our input to a probability of whether or not you know something belongs to a single class so this is our neural net it was really that easy we can click enter or shift enter and then now let's compile this model so we can call model.compile and we're going to need a few different things here so the first one is going to be our optimizer so you can see here that our there are a bunch of different optimizers that tensorflow already has for us now which one do you choose that's kind of that's that's a hard question to answer because it's a question that people still don't really know the answer to but one of the most commonly used optimizers is atom so that's where we're going to start so going back to our example here let's do optimizer and let's set this equal to dot tensorflow.paris.org dot atom and we're going to set our hyper parameter called the learning rate we're going to let's start off with what the default is so 0.001 now the second thing that we have to do is define our loss function so our loss is equal to tf.cara's dot losses so now because we're doing binary classification the one that we're going to use is called binary cross entropy so binary cross entropy like this and then the final thing that we're going to do is add a metric for accuracy and the reason why we're doing that is because we want to know you know how many do we actually get right so enter we now have a compiled model we have a neural net that we can actually feed data to and we can train but before we do that let's actually see how well this might perform on you know our training data and our validation data what we're going to call is model dot evaluate and here let's pass in x train and y train and see what happens okay so here we're getting a loss of 16 and an accuracy of 0.35 which is around like 35 so that that's pretty bad what about instead of train let's do validation okay it's around the same a loss of 11 and accuracy of 35 all right this is because our model hasn't seen our train or validation set yet and that's why the accuracy is so low we haven't really done any training so let's see if we can fix that what we're going to do is call model.fit and pass in the x train and the y frame values pass in something called the batch size and then epochs is how many times how many iterations through the entire data set we're going to train this and pass in the validation data now this validation data is just so that after every single epoch we can measure what the validation loss and the validation accuracy is so x valid y valid now we can run that all right so you can see that our neural net is training now going back really quickly to this batch size batch size is just a term that refers to the number of like samples or training examples in this case it's the number of women that we have like samples from that is used in every single iteration so this is how many samples we see before we go back and we make a weights update so you can see that here look our loss in our training set is decreasing fairly successfully and our inc our accuracy okay our accuracy seems to be increasing great now what about our validation loss okay our validation loss seems to be decreasing decreasing decreasing a little bit of an increase at the end but that's that's a good sign and our validation accuracy seems to be also increasing well let's see if we can do better so a few problems with our current data set we see that look all of our values here i mean our insulin our insulin range is from zero all the way out to like 800 whereas something else might be on the scale of like you know skin thickness is on scale of 0 to 100 but uh bmi is on the scale of 0 to like 2.5 the fact that our features are so different in terms of their range that is something that could be messing up our results so instead what we're going to do is we're actually going to scale the results so all of them are on a more standardized range the first thing that i'm going to do is i'm going to import a package that lets us scale so just like how we imported this function up here i'm going to import from sklearn pre-processing i'm going to import standard scalar okay let's rerun this again and let's go back down to down here like before we split it up into the train and test sets okay so here let's just run this again for good measure but instead of splitting it up right here what i'm going to do is i'm going to scale the quantities so here i can define a scalar and set this equal to standard scalar and i'm going to say okay this scalar let's fit and then transform this x matrix all right so once we've done that our new data i'm actually so let's actually see what what x looks like now so we can see that everything is a lot closer in range it goes you know from maybe around like one to like negative one okay let's actually see let's plot these values again and see what exactly is going on here so here let's transform this back into a data frame so what i can do is say transformed data frame equals pandas.dataframe so this is creating a data frame and then here okay let's let's use a placeholder called data and then the columns are the same as our current columns now this data we're actually going to do something called a horizontal stack which means that we're taking our x our new transformed variable x along with this y but we have to do a little bit of reshaping here so we're going to call numpy.reshape because right now our x is a two-dimensional matrix but y is only a one-dimensional matrix which means that let me just show you real fast so x dot shape if we see what this is and then compare this to y dot shape see this is a two dimensional variable and this is only one dimensional it doesn't have a one here which means that it's not you know a 768 by one matrix it's just it's just a vector of length 768 and if we call a horizontal stack of the two there'll be um there's going to be an error so instead we're going to reshape this and this just means okay project it into something where negative one means like numpy gets to decide this one just means in that second dimension it'll be comma one so here it's going to return an object of 768 comma one all right so let's run this oops here columns cool and then instead of data frame dot columns actually all of these we can leave the same but here we just want to make sure that we're plotting the transformed data frame so if we can see now like most of these so what standard scalar is doing is it's actually mapping our values to a normal distribution and trying to calculate like how far off our values are from the norm so here you know we can see that glucose levels now range i mean it looks like it's fairly centered around zero or negative one um but all of these are now normalized okay so we can actually delete the cell now that we've visualized it we don't need it and what we're going to do is we are going to [Music] uh this is now our new x and we can directly pass that down into here but one other thing that i had mentioned is remember how if we take x and we say x for the outcome is equal to one or sorry i shouldn't say x i should say the data frame or the transform data frame doesn't matter and then if i set this equal to zero remember how these two sorry a little typo remember how these two values are so different like the number of non-diabetes patients is almost double the number of diabetes positive patients so this can sometimes also lead to the neural net not training so well instead what we're going to do is we're actually going to try to get these two to be approximately equal and we can do that with another thing called random over sampler which means that we're essentially trying to get like more random samples into that first sample so that they now balance out these two values balance out the lengths now okay we can do this by importing another package so we're going to use a package called imbalance learn dot over sampling and we're going to import random over sampler now watch what happens oh okay that did not give me an error but in the past this might give you an error and if it does what you're going to do is just come up here and type in uh an exclamation mark pip install dash u imbalanced run that and then this if there's an error here that says you know there's no library then this up here will solve that for you and yeah you you would have to restart the runtime but because this worked i'm not going to do that all right let's come back down here so right before we're splitting it into the test and train sets over here let's actually use this random over sampler in order to get both of these uh equal to one so here let's call random over sampler and then let's split x and y well i guess let's let's redo this x and y definition by calling fit resample and then x comma y then we run this cell all right uh we can't exactly we we never like rerand we can do this again all right so now we see that we have 500 where the outcome is one and 500 where the outcome is zero so this is a good sign this means that now our data set is balanced in terms of the outcomes let's rerun all of these okay so it seems like our accuracy is closer to 50 now and let's think about like intuitively why that's happening well okay let's say that the first result it's just predicting random numbers zero one zero one zero one okay but our data set naturally before it had more values that were uh negative than positive and so naturally that accuracy would be skewed towards something closer to 30 percent if for example it's like a one to two ratio because just because of the fact that we didn't have very many values that were equal that were diabetes positive but now because we've balanced it this value is much closer to 50. all right now let's try to train our model once again okay so we see that again this loss is decreasing this is good and our accuracy seems to be increasing which is also good and let's check our validation because remember we need to see how this would generalize to unseen data our validation loss great seems to be decreasing validation accuracy seems to be increasing and now our validation accuracy is somewhere closer to 77 which is a significant improvement all right now at the very very end what we're going to do is we're going to evaluate this model on our test data set and we can see that okay our loss is around 0.5 but that doesn't really mean anything to us but our accuracy is around 77.5 which is pretty good because we've never seen this data before so that's a quick tutorial on how we just used tensorflow to create a neural net and then use that neural net to predict whether or not the sample of uh pima indian descent women have diabetes or not based on you know some data that we were given so that's very very cool now for the other tutorial that we started off with we will get to that in one second but first i wanted to talk about some more advanced neural network architectures let's talk about recurrent neural networks we've already seen the feed forward neural net we've already you know done an example problem with it we're very familiar with it right you have your inputs and then they feed into you know the hidden layers and then you get your outputs okay cool yeah we've got that you guys have got that but what if this data over here our inputs were some sort of series or sequence so for example they might be stock prices from the past 20 days or they might be values that represent different words in a sentence where the sentence has some sort of you know sequence to it or it might be temperatures from the past 10 months so on you you get what i mean like just basically if this was some sort of sequence if this data were a sequence a feed-forward neural net would not do a really good job at picking that up why because all of these different layers it evaluates each value as if they were independent so even if there were a series it's it's a lot harder for our feed for our neural net to pick up on that that's where recurrent neural networks come in so basically here we have our data points and you know this is our data point taken at t0 at t1 at t2 so we're going through time and what we can do is we can feed these into a layer of weights and then these layer of weights may or may not produce some sort of output but basically what this is doing because the you know calculation at each point takes into account the previous calculations as we move through this network we're essentially creating some sort of memory with the neural net so this neural net at whenever you know when we feed in x2 so x at t2 we actually have some information that we remember about x at t 0 and x at t so that makes a recurrent neural net very powerful ta-da this part acts as a memory and now instead of just straightforward back prop we have to use something called back propagation through time in order to adjust these weights okay this is our unfolded rnn and this is what our folded version would look like so you see how you know our x at time t is just fed into this neuron which outputs some value at time t and this value kind of gets cycled into the next iteration of x and so on well there are a few problems right with this because this rnn if you if you imagine there are many many time steps this might end up being a really deep network and then two during back propagation we might be seeing the same terms like the weights in here over and over and over because of the recurrent nature of this rnn now why are those problems well these two kind of compound on each other and you might get something called exploding gradients where the model can become really unstable and then incapable of learning because all the gradients that we're using in back propagation are getting bigger and bigger and bigger until they've reached like infinity and then you know our model our model updates become literally all over the place and we can't really control our weights and then yeah not good stuff happens on the other hand there's also such thing as vanishing gradients so here our gradients get closer and closer and closer to zero and so at this point our model stops updating and then it becomes also incapable of learning so some really smart people have studied these problems and they've decided okay here are some different ways that we can overcome this problem there are a few things that you can do with the activation function but i'll let you look that up on your own time instead i'm going to talk about different sorts of cells or neurons that people have come up with in order to combat this so there's such thing as the great the sorry the gated recurrent unit and this unit as you can see so it still takes x at time t as an input but here just has a bunch of gates that are associated with it and then it has some sort of output so there's just more parameters inside the neuron itself rather than trying to just directly sum up some weights there's also the long short term memory unit which looks something like this it's very very similar to the unit that we just saw but instead of two gates it has three again i'm not going to dive too deep into these things because these are more advanced topics but i just wanted you guys to be aware that these exist and if we do use them an example this is exactly what's going on it's just we have a few more you know bells and whistles inside of our neuron all right so now that we've touched upon some of the more theoretical aspects of neural nets and machine learning let's walk through a different tensorflow example with text classification and try to see if we can classify some wine reviews so let's get started on that let's go back to our co-lab notebook where we were studying the wine reviews and let's continue on with that so here i have that notebook open this is the code that we typed at the very beginning of this class so here we have our wine reviews and we're actually actually need to restart um we need to rerun some cells and we need to re-import our data so let me do that over here okay so we see that it's imported let's rerun everything all right cool okay this is this is our uh these are our points is what we're trying to classify now let's split this up into a low tier and height here as we were saying earlier so how we can do that is if we type in quality label let's say so let's come up with a new label or actually let's just say this is the label and we say this will be if df dot points so basically the points column of the data frame if it's greater than or equal to 90. let's uh this basically will return a boolean so either every single row it will be false if it's less than 90 or true if it's greater than 90. and all i'm going to do at the very end here is forces as a type for an integer so that it gets mapped to zero or one because remember our computer understands numbers really well so here then i'm going to say that the data frame i don't need all the columns i know which ones i'm going to use so i'm just going to say it'll be the description and the label so remember this is the description and this is the label okay and you know what let me just add points in here anyways so now if i look at the very beginning of this okay so we have the points and we have the label let's look at the tail maybe that'll be okay so here again we have the points they're equal to 90 and then the label all right great so let's now split this up into the training validation and test data sets so i wanted to this there just to show you guys that you know we were mapping this the right thing but this is what we can actually um keep now that we have our data frame i want to split this up into our training validation and test data frames now we did this in a slightly different way before but i'm going to show you a different way to do it because i want you to realize that there's not just one way to do it like whenever you're working with a data set you want to be able to be flexible and split things up in different ways so here i'm going to say train val test i'm going to set this equal to np so numpy dot split which is going to split our data frame and this is the data frame that we're going to split and actually what i'm going to do is i'm going to mix things up a little bit so i'm going to say sample and i'm going to sample all of them which is going to basically draw random samples but sampling the entire data frame and then i'm going to pass in the different cuts where i actually want there to be breaks in the data frame so what that's going to look like is i want it to be 60 for the training 20 for the validation 20 for the test and actually in a data frame this size you could even go like 80 for training 10 for validation 10 for test and the reason is just because there's so much data that like even with your validation and test sets even if you're using only 10 that's still enough data to kind of see how it generalizes so actually that's what i'm going to do um so i'm going to map this to an integer and i'm going to say okay so 0.8 will be our first cut meaning 80 will go towards the train data set so i'm going to say 0.8 times the length of this data frame and then our second cut is going to be at the 90 mark which means that 10 of the data set will be for the validation here and 10 the remaining 10 will be for test so if we run that okay cool and we can quickly say like you know let's print the length of these great so you can see still plenty of samples for our validation and tests data sets all right so this next function i'm actually going to copy from this tensorflow module right here all right so we're going to use this we're going to slightly edit it and basically what this function does is it converts each training validation and test set data frame into a tf.data.dataset object and then it will shuffle and batch the data so let's copy this code and go back to ours right here now i'm going to make a slight difference because our data set is so big i'm actually going to change this to a larger batch size and then down here instead of using batch size i'm going to do tf.data.autotune okay great so let's run this except actually so in this data set i believe they use target to define their uh target column okay yeah so they use target to create their target variable but for us we already have a column that defines it and we've called it the label so instead what we're going to do is we're going to change this to label okay and then the next thing that we're going to do is instead of this part here with all of this we're just going to set the data frame equal to data dot description because that's the only part of the data frame that we actually care about alrighty and here we can because we change that we can remove this and if we run that this should be able to successfully create our train data our validation data and our test data so here i can do train data equals df to data set and i'm going to run this on the train and actually let's just copy and paste this a few more times and here instead of train this will be valid and here i will call this test and actually we didn't create a valid it's val now run this hopefully great there are no errors so basically what this um function is doing it's shuffling our data for us and it's formatting it in a proper you know format but then it's also batching it into our batch size that we specified and pre-fetching now this prefetch you can kind of um i guess you can think about it as just try to speed things up a little bit trying to optimize things a little bit so this it just reduces some friction so these are our data sets now let's try to take a look at like what's actually in here so something that we can do is called train data of zero but actually this train data is now going to be like a tensorflow data set so what you need to do is you actually need to convert it quickly so that we can see what's going on so you'll see that this is actually a tensor it's i guess it's a tuple in this case but you have the tensor of all the strings and then you also have the corresponding labels that are associated with them the zeros and ones that we came up with all right great so let's talk about how our model is now going to work one thing that we imported that you might have noticed up here was this tensorflow hub so what tensorflow hub is is tensorflow hub is a repository of trained machine learning models so basically these are models that are ready to use um they just need some fine tuning and we can actually use one of these to help us in our process of text classification so recall that computers don't really understand text that well right like computers understand numbers really well so we actually need a way to transform all of these sentences like this into like numbers that our computer can understand and that's where this embedding comes into play so one embedding that we will actually use is this nnlm this en english and then dimension 50. so this is token based text embedding trained on english google news using like a seven billion document corpus so it's a saved model that they already have for text embedding and let's see how we do so you can say embedding equals set that and actually you know what let's just label this okay so here is our embedding then we're going to create a variable called hub layer and set this equal to hub dot keras layer and this is uh you want to pass in the embedding link and the data type that we're actually going to use we're going to tell this that we're using strings and then finally we are going to tell this that you know trainable is true okay cool oops we can actually call this hub layer and we can do that by passing in let's uh you know do this little hack again train data and um just say zero oops okay so basically what we did we need to only pass the strings so basically what we've done here is every single uh sentence that we had in our data set we're essentially projecting it into a length of 50 vector containing only numbers so that's what our embedding did it basically transformed our text into this vector of numbers that now our model can go and understand so let's build our model so let's again calls keras sequential so previously what we did was we passed in you know a list of all the different layers but i'm just going to show you guys another way that you can build a model so here you can also do model.add and the first thing that we're going to add is this hub layer that we defined up here so basically now the first transformation will be this text to value numerical value transformation then what i'm going to do is i'm going to add a layer and this is just going to be a classic dense layer and let's use 16 neurons again and we're going to use relu and i'm just going to add another layer of those and then finally i'm going to add my uh final output just like in the previous neural net that we created in our feed forward neural net okay so now that we have our model i'm going to actually compile it with the same exact compilation statement as the previous model the feed forward network that we did so let me just paste that in here so i'm saying model.compile i'm using atom as the optimizer again and actually the slanting rate let's go back to 0.001 um i'd copy that from another example and then for the loss we're again going to use binary cross entropy and for our metric we're going to add accuracy now these are because we're doing binary classification okay so here first let's actually so now that we have our model and it's compiled let's try to evaluate the untrained model on the train data and let's actually do the same thing for the validation data okay so it seems like our accuracy is around 40 percent not so great our loss may be around 0.7 okay so let's see what will happen if we do model.fit so here let's pass in our trained data and let's also let's do uh 10 epochs and let's pass in our validation data okay now we're starting to train okay so our accuracy is increasing it's over 50 now this is a good sign and our loss seems to be decreasing so this means that our model is training cool it's still increasing we like to see it and it's still going okay so i'm gonna let you guys train your model you can pause the video really quickly but i'll be back once our models are done training okay so our model's finished training let's take a look at the results so here it seems like our loss is steadily decreasing which is a good sign and our accuracy seems to be increasing almost to 90 okay that's also a really good sign now if we go over here and look at the validation loss in the accuracy our validation loss okay it starts pretty high and then it decreases and then it seems like it starts to go back up again which means that seems like as training's going on the validation performance is actually worse and worse and if we look at the accuracy okay it starts off pretty decent at 80 and it gets better and then it starts to kind of plateau and then it dips a little bit okay so what's going on here this is a classic example of overfitting so overfitting means that the training data that you do see that your model sees the model learns how to predict the training data really well but then it generalizes really poorly so what we can actually do is we can plot the model history so if we do history.history what happened and we look at the accuracy and then we also do the validation accuracy okay just some labels here and what we can alright and then let's just give this a title and then labeling what is going on here and then at the very end we can do guilty so this is going to plot our accuracy hopefully oh i think we have to type out accuracy okay so we can see that our training accuracy starts here and it gets better and better and better whereas our validation accuracy seems to okay it does well and then it kind of tapers off now if we look at the loss it's a different story so here let's just change this plot a little bit and let's look at this again okay so it looks like our training loss actually decreases very well whereas our validation loss seems to decrease and then start to go up again so basically our model is incapable of generalizing because we've trained it so much on our data that we're feeding it so how do we fix this one way would be to add something called drop out and drop out just means that every once in a while you select a few nodes that aren't really working so if i do this i can add a few layers like this in here and that should actually improve my training so the reason why this works is that every single time the model has to basically go through these obstacles these dropout layers where some nodes aren't working and try to figure out how to work around that so that helps it generalize a little bit we're just adding a little bit of randomness in there all right so here let's uh you know repeat our hub layer compile our model evaluate okay it seems like this is already really high but that's fine it could just be the opposite of what we just did um all right and then also another thing to do is stop earlier so let's just use five epochs all right i'll see you guys after training all right so we're back let's take a look at our results so our loss is decreasing accuracy increasing great okay so our accuracy our loss seems to increase a tiny bit by the end and our accuracy seems to decrease a tiny bit by the end but this is probably a much better generalizable model than our previous one that we had trained so finally what we can do is we can evaluate this on our test data in order to get our final results so sweet look at that our loss seems to be around 0.38 and our accuracy is around 83 percent so that seems to be really good awesome all right so up until this point we just created a neural network that we saw would help us with text classification however we did use part of the tensorflow hub which you know is not a bad thing but if we wanted more control over this model what would we have to do and so i'm just going to show you guys really quickly how we would recreate this model using a lstm here i'm going to label this section lstm and the first thing that we're going to do is to create some sort of encoder for our text because once again our computer does not understand english so what i'm going to do is create an encoder and set this equal to tf.layers.txt vectorization and then here i'm going to say the max tokens well this is going to be the maximum number of words that we're going to remember and i'm going to set this to 2000 so then what i'm going to do is call encoder.adapt and here i'm going to pass in uh our train data and then actually because our train data doesn't really like this encoder would only need the sentences uh we're just going to also pass in a quick little lambda function because our train data is composed of the text and the label but we don't really care about the label we just want the text so let's run this okay and let's check out our vocab so we can call encoder.getvocabulary and then let's actually uh let's take a look at this let's see like the first 20 items all right so here these are words that are encoded in our encoder or part of the vocabulary and this here this this represents any unknown tokens okay then we can create our model and again it doesn't really matter which way you define it but this is how i'm going to define it this time and the first thing that we're going to pass in is the encoder because this is the encoder is what is going to basically vectorize our text and then we need to have some sort of embedding for this vectorized text and that we are going to call keras.layer dot embedding so for the embedding we're going to need to pass in an input dimension and i'm going to set this equal to the length of the encoder's vocabulary and then we also need to pass in an output dimension so this output dimension i'm just going to set equal to 32 because i think that's how many lstm um that will be the size of our lstm and then i will also set mask 0 equal to true and the reason why we use this masking is so that we can handle variable sequence lengths okay so that's our embedding layer so basically this turns these two together we'll be able to turn our sentence into a vector of numbers that our neural net will be able to comprehend then next i'm going to add in a lstm layer and it's literally as easy as this um and we just you know how many nodes there are i put the output dimensions 32 so that's the value that i'm going to pass here and then let's just add a dense layer let's do a drop out layer because you know we saw previously that we might be prone to over training and then finally we result with a dense layer sorry i totally typed these wrong drop dropout and then dense one i also forgot the activation so here we want sigmoid because this is the output and up here we want really okay so this is our model and here i missed another s cool all right now for the compilation let's just borrow this right here and again we want to evaluate this on our trade data as well as our validation data just to see you know what happens so let's take a look at that okay so our accuracy seems to be around 53 not great our loss around 0.7 okay so now let's train our model let's try five to see what happens well i will see you guys in a bit all right so now we're nearing uh the end of the training and let's just take a quick look at the results so here our loss for our training data is decreasing strictly and our accuracy seems to be strictly increasing now let's talk about the validation law so it does seem like you know we cut it off early enough that this is still decreasing and waivers a little bit and our validation accuracy seems to be i mean the overall trend seems to be improving okay so we have 84 accuracy here which is pretty good and finally at the end of course we can evaluate our model once again so here it's called model.evaluate and use our test data to see what happens all right cool so we get an accuracy on our test data of 84 which is pretty sweet that is the end of the jupiter notebook tutorial and thank you guys you know for being here for following along with this and now you've learned how to i mean not only implement a feed forward neural network with numerical data but you've also learned to use tensorflow with text classification and trying to figure out whether or not a wine review based on the review itself is maybe you know lower tier or higher tier which is awesome i hope you guys enjoyed this tutorial and you know give this video a thumbs up if you liked it leave a comment and don't forget to subscribe to free code camp and kylie all right see you guys next time

Info

Channel: freeCodeCamp.org

Views: 244,555

Rating: undefined out of 5

Keywords:

Id: VtRLrQ3Ev-U

Channel Id: undefined

Length: 114min 10sec (6850 seconds)

Published: Wed Jun 15 2022