Stats 102B Lesson 4-3 Basic Keras and Tensorflow in R / RStudio example

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so today's uh not like a formal lecture it's more of a little uh demonstration and i just thought um you know we we covered some very very very basics of neural networks and um you know um i did it with just some basic matrix operations and we had kind of the most rudimentary form of gradient descent and i'm gonna assign a homework uh post a homework assignment where you're going to kind of do um work with some neural networks or at least write similar code where you're going to do matrix operations and you'll do it kind of in this manual fashion which you know hopefully will be um of educational value for you but in real life nobody would do um neural networks in that way uh everybody kind of kind of uses these uh formal formal packages and i would say uh python is probably the preferred language of choice as far as neural networks go however um and and kind of the the biggest um systems that are not systems but the biggest the most popular software for neural networks would be pi torch which was developed by facebook and tensorflow which was developed by google and so um so these are kind of the i don't know the titans of the uh the industry and and they've published you know kind of all of their neural network code um in these you know open source packages that you can then you know use and adapt and things like that so we will take a look at tensorflow which you know the back end as far as the calculations go what language is that written in i'm not i'm not 100 sure i want to say something else c maybe i don't know okay um but then on top of that uh these interfaces have been um built and so there's a there's a package called keras and that sits on top of tensorflow and and it allows for the kind of creation of neural networks um you know in a little bit of it easier fashion okay and i don't want to say easy because it you know just like any other package and things you know just like learning to plier there's a whole you know set of vocabulary and functions and things that you've got to learn but uh and so anyway you can set up tensorflow you can set up keras and you can interact with it via r it's going to require you know a python or anaconda installation put on your machine especially if you have a windows thing and during the installation process this it you know the installation process is is a bit of a uh a challenge in of itself it can be uh depending on on your computer so um but i don't want to spend time going over the installation just because your machine is probably different from mine and what you're going to encounter will be a little bit different but you know ideally you can just do install that package as tensorflow and it's going to do a bunch of stuff if you've already installed anaconda on your machine you can kind of skip the anaconda part it might say like you know it needs to install mini cond or something like that um i have a question yeah go for it um will we need tensorflow and keras for r for like our homework no no no no no this is more like just a just for fun demo okay just showing that like this stuff exists so um okay so you don't don't don't stress about it um what uh um oh okay oh so anyway um oh yeah it's friday and uh and data fest starts uh this weekend are you guys doing data fest maybe some of you okay some of you yeah well some of you yes some of you know um so oh hang on a second okay so um the uh oh yeah so it's data fest and um i i signed up to be i guess an advisor or mentor or something so um i don't i don't know where they publish these things but i think i'm listed uh so i'll hold kind of like uh i guess an open open office hours or something uh tonight at like 8 p.m um and just so you you know if you have any questions if you want some advice or guidelines um general advice for those of you participating in uh data fest is start start simple and start just by exploring the data so spend a lot of time exploring the data just you know use deploy or whatever it is to kind of get summaries by groups and whatever i don't know what the data is but but just start kind of slicing the data in a few different ways and then see if there's anything kind of interesting that that you notice during your exploratory data um analysis and then and then move from there okay and then i would say overall focus on um while you're working in your group you might find like there's you know several different avenues to pursue several different ideas that you want to pursue and i would say um in the beginning you can pursue some of these different things but then towards the end you want to kind of hone in on just basically one idea and uh and center your presentation around that that central idea that that's kind of my general general advice uh but i have but i have no idea anything about the data or uh you know what it's gonna look like tonight so um but anyway if you're participating that's that's what's going on okay so anyway um back to tensorflow and keras and all of this stuff there's a page tensorflow.rstudio.com and you can go to the installation and and they've designed these packages to kind of set up to help you get set up get tensorflow set up on your uh on your machine okay you will also need to install keras okay and so um you have the r interface to keras keras sits on top of tensorflow and it kind of and that's kind of how we'll build our neural networks so anyway um i'm just using some very very very basic uh intro and intro demo and then um uh and then i've annotated kind of uh all of the lines of code hopefully explaining um what exactly is happening when we're specifying the models and things of that nature okay so anyway i'll i'll start by loading tensorflow and loading keras okay and there's a few uh built-in um data sets one of them is the mnist handwritten digits data set okay and and let me just kind of show you what the the structure the structure of mnist is that it's a it's a list of two and uh and you see you have the training and then you have the test the training data set you know has 60 000 images and this is kind of your first tensor so a tensor is basically an array um but it allows for it to be high dimensional so we have matrices so basically a tensor is just a n-dimensional matrix or you know the equivalent of an array in a in an r and and the name tensorflow kind of kind of comes from the idea that you know you have one matrix that's being worked upon you know via matrix multiplication or whatever and then you and it results in another matrix and you know you apply activation functions and things like that okay so um our input matrix x our input array x or input tensor there's 60 000 images and each image is 28 by 28 pixels okay so if i um if i just do a train i'm sorry mnist dollar sign and list dollar sign train dollar sign x and let's just do the uh the first image okay so one and it's three dimensional so i have to put in comma comma two two commas there okay this is going to be the the result um you know what let me uh shrink my text a tiny bit and you might actually be able to see the number just because the zeros show up uh as uh as black text i mean so zero would be kind of black or white and then you know the the large numbers too you know where there's there's ink actually shows up uh almost like a thing so you can almost kind of see the uh the number that that appears here right so that's uh that's image one so that looks like a five uh image two looks like a zero so it's kind of neat seeing it in this um so there's a that looks like a four right okay so um so it comes in in these uh 28 by 28 um pixel grids um let me uh let me just reset this okay and um and what we're going to do is we're going to just kind of divide everything by 255 and get everything on the scale from zero to one okay which is which is what we did um when we did the kind of the demo code i think right and then what we're gonna do is we are going to set up a sequential model okay there's a there's a keras supports different models um but a sequential model is kind of your your standard your standard neural network model which basically means you know you start off with your input you do something and then you do something and then you do something and it just kind of forms a sequence of layers and operations um some of the other models that kara supports would be uh you know if you have this kind of input over here and then a different input and then you're combining them later and you can kind of set up your um set up your model you know in some some other form where you have like multiple different inputs that you know and it gets a little you know you can make fairly complicated uh things um so anyway just i've included a bunch of kind of links to uh some some of the reference pages so this is the the regular keras reference for the sequential model and then there's also the keras for r studio reference the cares for r studio reference in some places are just are very bare bones so you'd have to go to the kind of the the primary cares thing which the as far as the the main keras webpage goes it's written in python code but thankfully the the calls to the functions are almost identical um whether you're using python or our are you know obviously um a couple things are going to be different but as far as the arguments that that get passed into keras and uh get passed into our you know via our studio or r will be um will be similar okay so so here we initialize a model okay and we just say hey we're going to set up a sequential model and then what we do is we write out what kind of layers that we're going to uh to have okay and so the first thing we're going to do and you don't have to do this but just to kind of uh get us in a in a similar form of what we did um in the uh in the example where we were using matrices is we're going to flatten the um the input values right so our inputs are coming into this 28 by 28 matrix and basically what we're going to do is we're going to just kind of flatten it to be a 1 by 784 vector okay and so so that's what this first layer underscore flatten goes okay and so there's all kinds of different layers that you can use in keras okay and um under the uh under the keras reference so on the on the primary thing you can click the layers api and it kind of gives you a whole bunch of different types of layers that can be um that can be added okay and and you can you can read all of these things all right but um but basically the the first layer that we have here is layer flattened it's gonna take that 28 by 28 flatten it into a one by seven eighty four or seven eighty four by one okay and then this the layer dense is a what what's known as a densely connected or a fully connected neural network layer okay that's the idea that every node in the previous layer is going to be connected to every node in the dense layer okay so that's densely connected fully connected those are kind of uh those are synonyms and every node from the previous layer is connected to every node in this layer okay and so here you you specify some kind of arbitrary number uh you can say you know i just want um uh i don't know here arbitrarily uh the i'm just following the tutorial um code the tutorial code said you know let's use 128 notes okay i think in my example in class i used 30 nodes in the hidden layer whatever you want to do right if you want to have additional layers you know if you want to have a second layer you can just add a second layer just like that okay and so but here you know i'm just kind of keeping it simple we'll have uh one fully connected layer uh and within the dense layer you can specify the activation function you can say i want the rectified linear um activation function so um you know in our class we uh in in my demo i use the the sigmoid function okay but you can you can specify any of these things you can also specify the activation as its own layer right so they can be done in the um you know they can be done in the dense layer and the dense layer you can specify the activation okay or you can specify the activation uh as its own layer uh and so you know however however you want to do it right you can specify the activation and and all we're doing is we're just saying you know it goes from the previous layer to the next layer okay and so we're going to specify an activation function or yeah activation function in our case the relu and then we can specify a dropout layer okay drop out let me just illustrate what a dropout is dropout artificially takes artificially takes some of the uh nodes and it will set them to zero okay so this is this is kind of these are densely connected or fully connected neural network layers okay and so every node here will uh is connected to the next node right and what dropout does is it just says you know what we're going gonna just randomly select some of these nodes and we're going to just set them to zero okay and effectively when you set a node to zero it basically it effectively kills all of the connections from the previous layer and kills all of the connections going to the next layer okay now why would we want to do something like that okay dropout is is a way to help avoid overfitting okay by if you have every single um node connected to the next node you know it's going to basically um every adjustment you know on say on this node right here every adjustment here is going to play a small role in you know whatever prediction we make at the end right so if i if i make an adjustment here you know it's going to play a small role in every single note here and it's going to effectively affect the the final output layer okay and um and that's fine but basically every single input image is gonna uh every input gets connected to here and and this is going to get adjusted and it's going to influence everything out okay when you apply dropout basically uh the dropout happens randomly and therefore you know um you know some of the images will will affect uh it's kind of just it's like saying you know we'll use some of the images during our training of this node but not not every single one okay uh and and so that's going to kind of just artificially uh not use the entire training set but just only use a subset of it which which can improve in some cases can improve import performance by reducing uh the chance of overfitting okay so here you know just arbitrarily we're saying uh use a dropout of 20 which means 20 of the nodes are going to get set to zero and you know you can you can adjust this anywhere from zero to well you don't want to do a hundred percent um but you know values under 0.5 probably a little bit more okay and then finally we have the output layer okay and this is going to go to our output layer and that's going to be a densely connected uh um question is dropout like cross validation it's not quite like cross validation it's just um a little bit more like subsetting randomized subsetting i i would say okay um and then uh and then finally for the output we're going to um make uh predictions okay and if you take a look at the um structure of mnist okay the um you know we have the the output is a label okay but we're going to set um set up 10 um output nodes okay and the soft max the the softmax activation is used to uh oftentimes for classification models okay and basically what the soft max is is let me just take you to the wikipedia page on softmax is you know we're going to have 10 nodes in the output layer you're going to take uh take the output raise e to that number whatever it is so so the outputs can be anywhere from you know uh i guess well with the rectified linear it's going to be anywhere from zero to you know infinity but it could also take negative values as well and so so you're going to raise e to you know some number it's going to get huge okay and then other value and and you do this for every single um category right so maybe 5 has a value of 100 okay 6 has a value of 20. 7 has a value of 3 8 has a value of 1 or something okay you're going to raise e to every single one of these powers and then you're going to sum them up okay so this sum could be something extraordinary nearly huge or small but anyway basically what's going to happen is it's going to reduce the output the output value is going to be between 0 and 1 for everything and the sum of all of the outputs will add up to one okay so that's that's what's called as a softmax and and it's a simple function it's used for um classification okay so this is this is it this is the specification of the model if we wanted to specify a different model um like the the one we so i'm gonna comment this out but if i want to i could do sigmoid i could do 30 and uh and this would be a lot more similar to um sigmoid here okay this would be a lot more similar to kind of the the model that we created um in monday's lecture okay so anyway that's that's how you specify model you just kind of list them off uh if you're interested in using keras or anything like that i just strongly encourage you to um take a look at some of the examples uh that that you might see and they're just kind of listed off here if you have a dense activation dense activation things of that nature so so we have that okay all right so i'll go ahead and i'll set up my model here so i'll run this whoops specify the model and specify the model and then you can ask for the summary of the model and the summary of the model just kind of gives you a little bit of the structure all right and here it says okay the first one the model the the output is going to be 784 uh nodes coming in that layer okay and then you have a dense layer and the output of that because we have 128 nodes in the dense layer the output is going to be length 128 and this is how many parameters need to be estimated okay and this is going to be because every node in the previous layer is connected to every node here this is the result of 784 times 128 okay so every weight from the previous layer to this layer plus you have a bias term for every single one of these things okay you can also and and that's how they get this number one hundred thousand four hundred eighty okay so that's that's how you can specify uh the thing if if you want uh no biases uh i think you can do let me just double check the reference you can you can specify whether you do or do not want biases but probably weren't it where is my thing okay dense layer use bias we can do use underscore bias false if we wanted to now probably we should use biases but if we did this then um did i do something wrong here sorry oh i'm sorry i kept adding okay so here's the initial model we're going to initialize that here's the model and then we'll summary on the model and here the number of parameters to estimate now is uh just 784 times 128. but i do want the biases so i will specify it with the bias so let me re-initialize my model specify it as such with summary and we get that okay and then finally it also shows you the number of parameters to estimate between the um the the model and the output the input and the output model and there's again 128 by 10 is 1280 plus the biases 1290 parameters to estimate okay and it lists off the parameters here just in case um in your design it's possible that you end up something with like 10 billion parameters to estimate and you realize oh you know what this model might have a few too many parameters to estimate and we're going to overfit right so because i have about 60 000 images right and if if i have more parameters than input images you know we run the risk of overfitting and things like that and so there's always there's always some kind of risk okay um because i've got so many parameters but but that's fine okay now to to fit the model we have to run a compile and we run a fit okay now these are um come directly from the keras training api okay keras uh api so if you um if you follow that link it will talk about the uh the apis you know how to interact with the uh the keras model and so you have the compile method and you have the fit method okay compile basically specifies a few other things as far as how we are going to train our model okay so here we in in the model specification we specified the layers okay and then for the compiling part you're going to specify how are you measuring loss okay in class the way we measured loss was using the squared the mean squared error we just said you know take you know zero one take the difference measure the squared error okay uh you have a few different choices for the um the the loss function okay so if you um i i have a the that linked here okay so there's all sorts of ways to measure loss okay so you know probably for regression type problems mean squared error is probably your most common choice but you can also do a bunch of other things mean absolute error mean absolute percentage error things like that mean squared logarithm uh things like that okay and as far as things for categorical variables there's there's a few different choices you have um [Music] the binary cross entropy the categorical cross entropy and uh um and so maybe i don't know if you'll talk talk about these different kinds of uh ways to measure error in a 101c but uh but there's a whole bunch of stuff you know ku black liebler divergence okay and you can click any of these things there's often just a very brief description on the scenario in which you would use this particular type of loss and then you'll see in here name equals and then there will be some kind of string that we would use so in our case we're going to use the sparse categorical cross entropy and it says use this cross-entropy loss function when there are two or more label classes uh which is what we have here we have 10 label classes and things like that okay and and it talks about you know if you have different kinds of representation one hot representation meaning um if you convert everything into zeros and ones you can use the categorical cross entropy loss things of that nature okay so we specify our loss function mean squared error or some kind of categorical loss function we can also specify how to optimize it okay and again consult the reference okay and so as far as the different optimizers go you have a gradient descent which is sgd okay and then everything else here is some version of gradient descent but has been tweaked for better performance okay and so you have uh ada grad which is um the ada grad algorithm it's it's another kind of gradient descent thing adam is another gradient descent this is atom optimization is a stochastic gradient descent method that is based on adaptive estimation of first order and second order moments i have no idea what that sentence even means but uh but apparently you know it performs well and you can you can try some of these things sometimes you know a lot of times you can just kind of stick with one and it will work but then maybe maybe depending on your model it it has trouble performing or something and you can try switching your optimizer and sometimes switching the optimizer fixes the trick or it could just be your models poorly specified there's all sorts of things as far as why something could break so um so who knows okay and then uh finally you can also specify you know how you want to measure your performance okay so you can specify metrics and in our case we're going to use the accuracy okay so we will go ahead and uh compile the model all right and then now we're going to fit the model okay and so fitting the model says okay here's my training data and i want you to run this many iterations and um and if you want to do some kind of validation thing you can do do that as well okay and so what we have specified here is as far as fitting the model we're going to say all right the training data we're going to use the training data x and the training data y okay an epoch an epoch is how many times you should cycle through all of your training cases so i have 60 000 images and so if i just do one epoch it says just go through all 60 thousand images once okay here i'm gonna say go through uh all sixty thousand images five times okay so we're gonna go through all sixty thousand images five different times uh and and that's gonna be an epoch you can specify um more or less you know obviously more epochs will take more time the validation split here is a little bit like cross-validation um but um but not formally so it's basically um at the start of each epoch or not at the start i think at the very beginning at the very very beginning it just does a a training validation split so here we're going to say take 30 of the 60 000 images and put it into a validation set so we have our 60 000 training images we're going to split it into 42 000 and 18 000 and we're going to set the 18 18000 aside we're going to use the training data to fit the model and then at the end of each epoch we run a validation we're going to kind of make our predictions based on the 18 000 and we get a we will get a measure of its performance and you will see um in the plot you will get the the training error so how much error do we have on the training data and then it will also um get a measure of how much air we get on those 18 000 validation data sets right and so this is kind of a you know you specify something over here verbose will give you basically a progress bar for the um as it's fitting right so you know this is a simple data set not simple but just the mnist is so clean and um and it kind of can process through it fairly quickly but in some cases you know training your data on some million image data set can take hours on your computer and and so the verbose part will give you basically some some progress bar information okay so i'm going to go ahead and run this and this will take a couple of seconds on my computer here okay and you'll see uh see it running okay so here is the the fit right and so this is a measure of the loss and the accuracy this is uh on the training and then with each epoch it shows us basically how much our validation set improves okay so as we continue running modeling our data we can see the loss function which is our whatever cross sparse categorical cross entropy okay you know that continues to uh decline but our validation error our validation loss it doesn't change too much from the fourth and fifth epoch okay and then we see a similar type of thing regarding the accuracy which is just another metric you know how how accurately are we performing with our validation data we achieve around 97 accuracy and it seems just 96 97 percent seems to be fairly stable after the third or fourth epoch whereas with our training data um we uh it continues to improve as we run more and more epochs okay uh and then from there you can run uh your predictions right so now you have a model you can use it to predict on kind of your your test cases okay and then uh and so here these are kind of the this is the predictions this is let me just go ahead and uh and i'll round this around to so two decimal places we'll just round to one okay um that's zero and uh and so we can say all right the the first one we're predicting out of our 10 nodes we're predicting basically seven and two and so on and we're predicting all sorts of things okay and then you can run an overall evaluation on the model and it says okay your accuracy for the the test data you know we've achieved around 97 accuracy using this kind of simple fairly simple model that we've created using tensorflow and keras okay there are a few more examples that you know if this kind of stuff is interesting to you you can check it out there's a few more tutorials here i basically followed this basic uh the the quick start tutorial here um and i'm just in my code or i've added a few more comments to kind of expand on each of these things just because this this tutorial here is very kind of bare bones okay but um you can see a few more things you've got an image classification data set okay and in here you can try to classify the different kinds of they have images of clothing right and so this you would say okay that's a boot and i think you've got the fashion mnist what does it have there's a bunch of different things okay but anyway this is this is kind of uh a similar thing where you have 28 by 28 pixel images and you're supposed to say that's a sweater and that's a coat and that's a t-shirt and that's a shoe and that's a boot and things like that okay other other examples you know the boston housing prices data set text classification things of that nature um i totally forgot about view quizzes today huh all right well i guess i'll just give you all three answers right now okay uh what is today fourth week friday fourth week friday all right uh b c and d so uh b c d so bear cat dog bear cat dog for three view quiz answers for today okay well anyway that's just uh oh it says the quiz is not available oh that's because um yeah i generally make it not available until let me let me fix this up here and uh 5 pm okay so so we will do that okay it should be fixed now you should be able to take the quiz bearcat dog okay and so this is just a simple kind of demo regarding uh keras and tensorflow just in case it's something you want to kind of play around and get started with um not not really a formal lecture i'm not gonna put any of this stuff on uh exams or anything like that but um but i i imagine some of you are kind of curious about doing a little bit more with neural networks and so hopefully this is just just enough to get started um and at least at least pointing you to the reference resources that you'll definitely need to um to do anything and there's a lot there's a lot to read there's a lot of tutorials out there not and not every tutorial is great so just be careful okay i would say start with clean data sets okay um if you want to play around with keras start with some clean data sets like the mnist data set the fashion mnist data set is also a very clean data set and some of these other examples are are in these tutorials are clean and then once you feel comfortable then you can go on to places like kaggle and you know take data sets from there but but be careful just because on kaggle there are a ton of botched messed up data sets where people did some web scraping but when i go through the data i think they did a bad job with the web scraping and so the resulting data is is just full of duplicates are full of things in the wrong categories and wrong columns are just poorly formatted where you know uh some of the columns should have multiple things right i think i think i've seen a data set like movies data set and and you know a lot of times a movie will have multiple genre categories right it'll be like an adventure movie slash a romance movie slash you know action or something like that and then but for whatever reason in the data set they just picked the first thing the first you know the first one out of you know maybe like three categories or it might be like uh or you know this movie stars so and so and um and they'll pick only just the first one and and i don't know how they picked the first one and it could be alphabetical or something but but definitely those types of things um are problematic so um or like you know where was the movie produced or filmed and a lot of movies you know have multiple locations multiple things and so um so if there's only like one per column that that could definitely be problematic and i understand the desire to reduce everything to one value but you know who's the uh um i don't know but you think of like the harry potter movies and uh you know what exactly what's like you know who does it star i guess daniel radcliffe would be number one but then you also have hermione and ron and you know who knows so but um so anyway just be careful always always pick a data set where you're a little bit familiar with the subject matter so that when you go through a few lines and you should definitely go through some of the lines line by line you can say okay this data is seems accurate or this data is definitely missing some information okay well uh we'll end uh here and um have a good weekend you guys will see you guys on monday and if you're going if you're competing in data fest uh good luck and and have fun out there um and if you're on data fest um feel free to stop by uh i guess my my open sessions and uh i'll see you then all right have a good night you guys and have a good weekend

Info

Channel: Miles Chen

Views: 283

Rating: 5 out of 5

Keywords:

Id: _m8QKcf54Ws

Channel Id: undefined

Length: 43min 2sec (2582 seconds)

Published: Fri Apr 23 2021