TensorFlow Tutorial 17 - Complete TensorBoard Guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the goal of this video is for you to get a thorough understanding of how to use tensorboard to easier understand debug and modify our models this is a long video because there is so much to cover when it comes to tensorboard so let me first of all show you an overview of the things we're going to learn how to do in this tutorial we're going to start off learning about what is perhaps the most basic but also probably the most useful thing which is obtaining these accuracy and these loss plots and in this example right here you see the accuracy plot when training our model with varying learning rates and then we're going to move on and learn about how to visualize images so in this example here we have performed some minor uh data augmentation uh in turning a very few percentage of the images to grayscale and so visualizing these changes can be very useful to make sure that what we intend to happen is actually what is going on so for example uh here we have a horse that has been converted to a grayscale image also we're gonna see how to both create and plot these confusion matrices to tensorboard where we have the predicted label on the y-axis and then on the we have the true label on the x-axis in this way we can see uh what and where the model is misclassifying images so for example uh here we have that it correctly uh predicts airplane 65 of the time but in seven percent of the time it's misclassifying a bird as an airplane also we can see that it predicts cat correctly 40 of the time where the largest mistake happens when it misclassifies it as a dog in about 21 of the cases other useful things is that you can visualize the graph of our model so here we have a very very simple model which takes in two inputs x and y it performs a matrix multiply and then send that through a relu and that will be the output of the model and so this can be useful if you're building more complicated or complex models like resnets where you want to make sure that all of the skip connections are at the correct places or if you just want to get a better understanding and see more visually how the model looks like other cool things that you can do with tensorboard is that you can visualize and see the distribution of the various parameters of your model so this can be useful to debug if you get error messages and you want to see exactly where in the model the error is occurring then these distribution plots can be useful to to know which layer the default lies in but we also have a lot more things to look at so we are for example going to do hyper parameter search using the h params api of tensorboard with this you can see what parameters to use for your model by seeing the different correlations between in this case the dropout the learning rate and then the number of units and then how that finally corresponds to the accuracy after a single epoch run so uh to easier understand this right here we can restrict our attention to only looking at what corresponds to the best accuracies and what we can see here is that the model preferably wants quite low dropout and then it wants a very high learning rate and also high number of units so in this case we're checking the accuracy on the training set so that's probably why it wants a very low levels of dropout then we are also going to learn how to use the projector tab of tensorboard which is pretty awesome visually and you can use different algorithms like the tsne and then the pca to get an understanding of how your model is learning to represent these images projected down from a very high dimensional space to 3d where we can easily visualize them for example here we are running it on the mnist data set and we can for example use the tsne algorithm and if we for example now search for i don't know zero then we can see that there are clusters of that zero right here and similarly if you would search for two we would see that there's a cluster over here of the val of the number two and so this can be useful to understand how uh how the model works and also how the model create uh its representation the tensorflow profiler is also a very useful tool that allows you to see what is taking up the most amount of time and sort of what aspects you should aim to improve in your training so for example here we're given a summary of uh the the different parts and how much time it takes out of our training so for example in this graph right here we can see that the input is taking a very big part of the time and so what they also do is give you some recommendations um and they say that the imp the program is highly input bound because about 84 of the total step time is waiting for input and so what we should try to focus on in this particular program is improving the efficiency of our input pipeline and there are also other parts of the the profiler where you can see many different other parts as well but more on that later now you have an idea of the different parts and areas of tensorboard that we're going to cover so let's get started all right so to make this video at least relatively well structured i've divided it into different files uh where for each file we're going to focus on one specific part of tensorboard and for each of them i have a starter code and my thinking is that i'm going to go through the starter code which is based on code from many of the previous tutorials i've done so i'm going to go through it relatively quickly and then you can check out the previous videos i'm going to refer to those uh to go into more depth in each of the different parts but so we're going to base everything from the starter code and then we're just going to modify it or add some parts to to write to tensorboard so uh to start off with we have these are the basic imports that we're gonna need uh and then uh this is what we've seen previously we're gonna add those two lines just to avoid any gpu errors we're loading the cipher 10 data set from tensorflow data sets we're normalizing those images by dividing by 255 and then we have some data augmentation on those images we also so then we do the mapping and the cache uh and then the batching and prefetching on those training and test set uh although we're not shuffling on the test set and then we have the class names right here and these are used later on in the projector tab to to see the correct label for each uh respective image then we have the a model right here very simple model using com layers two comp layers a max pooling a flattened and then two dense layers with a drop out in between and then right here we have some some setup for custom training loops so uh we're getting the model first of all then we have the loss function we have the optimizer atom in this case and then we have a accuracy matrix to keep track of the current accuracy and then we have these two train writers which is what we're going to use to write to tensorboard and these are going to be a train and for the test so for each of them we're going to keep track of a step for the train step and the test step all right and then we have custom loop uh where we're going through for epoch in range of nume box and then for batch index in in uh ds train uh and then so this is pretty basic we we're doing for propagation backward propagation and then we're up uh doing gradient assemb or an optimizer step right here then we're doing the same thing for the training set although we're not doing back propagation we're just checking what the accuracy is all right so for this first file i'm actually going to do something that may look a little bit weird but we're going to remove a lot of this code so we're going to go just to here i think and so the idea here is that that starter code is where we're going to base pretty much every of the those files you can see right here but this specific one we want to use uh callbacks and this is a more simple way where we're not using custom training loops so i want to show that first and so after we get the model we're just going to do model.compile we're going to specify our optimizer in this case we're just going to use atom and then we're going to specify the loss in this case we're going to use sparse categorical cross entropy and then from logit equals true we're also going to specify the metric which in this case is going to be the accuracy all right and then we're going to do tensorboard callback so this is perhaps the most simple way to use tensorboard and i'm gonna show you exactly what it does but it's very convenient so all you have to do is you need to do keras.callbacks dot tensorboard you gotta specify a log directory where it's gonna keep track of all the logs and so let's just do that let's call it tb callback directory and then we specify some histogram frequency and this is just we're going to set that to 1 and that is for the distribution plots of the gradients and all of that stuff uh that i showed you previously all right so late uh lastly uh we're gonna do the model.fit so we're gonna send in the ds train we're gonna send the number of epochs let's say i don't know a five then we're going to do validation data and we're going to send in the test set actually i mean normally you would let's just for illustration purposes let's just pretend the test set is the validation data i know this is very incorrect to do but so normally you would split the training data into a validation data and then send that in but that doesn't really matter for this purpose and then we're going to specify the callback we're gonna do tensorboard callback and then we're gonna specify verbose equals two all right so that's it uh let's run this and i'm gonna show you then how it looks like in uh tensorboard but actually before we do that i have to show you how you would actually run it so how you would run this is you would go to the folder where you have that specific where you're running the specific script and then you're gonna also have activated the environment that you have tensorflow on and then you would do con rather tensorboard you would do log there and then you would specify the the one that you called right here so tb callback directory and if we run that we're gonna get back a a url and which is localhost 6006 in this case and so if we go to that and we update it we'll get something that looks like this where we have the training and then the validation accuracy for each epoch and then we also have it for uh the loss function so uh as you can see which is perhaps a little bit strange we have that the validation accuracy is actually higher but just keep in mind that we're using dropout and then we're also using quite i mean and we're also using some data augmentation which might make it more difficult for the model to overfit on the training data so just for example we're converting some of them to grayscale and uh of course that that could be quite more challenging so that's probably why uh normally you would see this in reverse that the training is higher than the validation all right but as we can see right up on this tab right here we have four different ones so that's just for the scalars that's just for the loss and the accuracy you also have the graph um that that sort of tells us how our how it looks like right we have our input two column layers max pool flat and dense dropout and another dense so you can inspect it right here then we have the distributions that i showed you previously where you can see the gradients of all of the parameters or rather the the distribution of the parameters all right so that's the most basic way to set up tensorboard using callbacks and uh what i'm gonna move on to is show you how to do it from custom training loops and so that's what we'll be working with from now on uh but just know that most of the things that i'm gonna show you you can also do uh using callbacks and uh and and and using model.fit but using custom training loops allows it allows us to have more flexibility and uh makes a lot more a lot of things actually more simple simple to do so let's do full screen on this and again this is just a starter code that you saw previously and all that we're gonna do is uh we're gonna move down to the part where we have the uh the custom training loops and here we want to now add so that we're printing to tensorboard and so one idea could be that either you're writing it to tensorboard for every batch and so you could do it here oh you could do it right here after this that would work but perhaps it would be cleaner a cleaner graph if you do it every epoch so what we could do is uh right here we're going to do with the train writer as default then we want to send in the loss and we also want to send in the uh the accuracy uh and just one thing right here is that we're going to send in the loss just like this and this is actually just going to be the the last loss of the of that batch so just a note you would probably want to to send in the mean loss over that batch and uh so you would have a list of the losses and then you would take the mean of that and send that in but this works just to show how you can write a tensor board so we're going to send in the loss right there and then we're going to set in step and we can use the epoch as our step then we're going to do the same thing dot scalar we're gonna do it with act let's see with accuracy so let me scroll down we're gonna do it with sending the accuracy and the accuracy are gonna be stored in this accuracy metric dot result and keep in mind here that that's reset in between the epochs and it's also reset in between the training and the test thing so this should be iterate through the test set all right uh and then we're gonna accuracymetric.results we're going to specify the step again as epoc and then uh that's it so now we've done it from training we also want to do it for the test so what we could do here since we're going to do pretty much the same thing we could just copy that paste that in uh and we've got to make sure that we're changing all of this stuff test writer step is still epoc step is still epoch and yeah so that should be it hopefully there are no mistakes in this right now so one thing to keep in mind here is that we're now using another a log directory so here we're using logs and in logs we're we're storing two subfolders for the train and the test um so let's run this first of all and again we're gonna open up this uh tensor this anaconda prompt right again and we're gonna uh use this but we're gonna specify the logged here to instead be just logs and then it's automatically gonna find the training and the testing so if we run that and we then go to to fire firefox or your browser and go to tensorboard to that on that url localhost 6006 and i guess we just have to wait a little bit here because it's going to write a re-block so all right so here we can now see the uh the accuracy plot for example and also the loss plot and that these are going to be very you know clear discrete steps because we're just recording in every epoch but of course you could um as i said do it every batch and that will give you a much smoother graph but what is perhaps so what is perhaps a little bit odd here is that the test set has higher accuracy than the training and i would imagine that has to do with the augmentation i chose to do for example i made some of the rgb on the training to to grayscale and that is that is of course going to make it a lot harder for the model to to train on those so that's probably the reason why uh so that's what i pretty much wanted to show you on that um you could also do more complicated or more complex things and that you could for example use it for hyper parameter search you could do something like forward learning rate in and then you could specify at learn rates you want something like like this and then let's just indent that one step and so what you could do then is uh you could sort of train a model for a specific number of epochs with each respective learning rate um and and here i'm using a grid search um just on those five different learning rates um so let's do that we can do train step equals test step and we're just gonna set that to zero every time and then we're going to do the trainwriter is tf.summary.createfilewriter and then we're going to do logs train but then we're also going to add let's see we're also going to add string of learning rate just to make sort of we can distinguish between the different runs uh so let's do that again and let's do the test writer and here we're gonna do test and then we need to get the model because we want to sort of refresh the we want to reset the weight for each time so we're going to do get model then we're going to do optimizer is keras.optimizers dot so we're we're also you know we need to re-initialize the optimizer with the specific learning rate um and then you could do it this way you can write the test writer like this from some epochs but let's just do let's just do it for every batch as i said that might make for a smoother graph so what we're going to do here we're going to specify the train step rather than the epoch and then after here we're going to iterate up the train step by one and that's going to be sort of one one discrete step in the in the plot and then we're gonna do that for the test writer as well so we gotta change this to the test let's see test step and we gotta do it for all of these to make sure that we don't do anything wrong here uh we're gonna iterate up that by one as well all right so let's think is there anything else now we need to change um we could decrease the number of epochs just to make it faster so let me run this and i'll show you what it looks like in tensorboard all right so looking at the tensorboard right here we and i've just separated out for the train so you can just write train right here and you can get all of the different ones and these correspond to different learning rates and uh you can also do some smoothing right here so you can get uh you know the uh the loss just varies a lot and for from epoc for batch from batch to batch and so you could use some smoothing to make some better looking plots and then so for example just in this case if we compare we would have that the learning rates the initial earning rate should probably be between 0.001 and then 0.0001 between this uh this orange and this red one right here and so that's it for sort of getting lost plot and accuracy plots and you can play around with that and you can that is very very useful so uh let's move on to getting images so what we're gonna start with right here is uh using just visualizing images and probably the first thing we're going to do here is we're going to add some data augmentation we're going to use matplotlib and to plot it they want values between 0 and 1 so we can matlab wants values so what we can do is we can do image is uh tf clip by value image and then clip value min is zero clip value max is one and so by the way if you're wondering why it would be under or over this 0 and 1 range since we divided by 255 that should be the natural range but because of this data augmentation some pixel values can become lower or greater than one so we just want to make sure at the end that they are actually between zero and one um and i'm gonna that's going to be used later on but i'm gonna show you what why in just a moment so um what we could do here is uh we could actually remove the the training parts we could just do for batch like this and what we could do is we could iterate through for batch x y in enumerate ds train um and so what we're going to do first of all is we're going to create a figure and we're going to create an image grid so we're going to send in x and y to that and then class names so this right here is a function we're going to create in this utils function right here but we're going to create an image grid and then we're going to do with writer as default we're going to do tf summary dot image and we're going to send in the visual visualize images and we're going to do plot to image and we're gonna send in the figure and then the step is just equals to step um and so we could in this case we're just gonna just have a single writer and do it like this and instead of using train step we're just going to call it step like this and then that's pretty much all we're going to do we're going to iterate up step by one all right so this is how it's going to look like in this file but of course we need to create this image grid and we need to create this plot to image i just want to say first of all that there are easy ways to do this um and what i'm doing here but we're going to create this image created in this plot to image um is unnecessarily complicated uh but it helps or rather if you want it in a clean way so that you get uh in a grid and you get it with the labels above it and so on then this is how you're going to want to do it uh there are easier ways to just plot the images but they don't look very nice and of course we want it to look nice so let's go to utils and what we're going to first of all is i'm just going to copy in some imports that are going to be used in this file um and the thing we're going to do first so we had let's see we had the define plot to image uh and we're going to send in some figure so let's do pass on that first of all and then we had another image rather another function so we had image grid and we send in some data we send in some labels and we send in some class names and what i'm going to do is i'm going to copy in from tensorflow official tutorials this plot to image so this is stolen from tensorflow official guide you can see the url right here but essentially so it converts the matplotlib plot specified by figure to a png image and returns it the supplied figure is closed and inaccessible after this call so basically it's uh it's con it's converting the matplot lib uh plot into a png in memory and then it it gets decoded to a tensor which can then be sent to the to the tensorboard so here we're using io to save the plot in memory then we're uh closing the figure uh prevents it from being displayed directly inside the notebook and then we're converting png buffer to tf image by image.decode.png and then lastly we're adding the batch dimension so i mean i don't want to focus too much on this this is not really relevant i would say this is just some boilerplate code just to make sure that it's in the right format which is a tensor and then this image grid is what we're going to do first so we're going to create the image grid which is going to create this matplotlib figure then we're going to send it to this plot to image and that's going to then be sent to tensorboard so first of all the data data should be in this format so we should have batch size h w and then channels um so what we could do first of all you see we could just assert data that number of dimensions is four uh what we're going to do is we're going to create a figure so figure is plt.figure of fig size and this is just matplotlib right that's the import right here importmatloutlib.pyplot as plt so we're creating that with a fixed size of 10 and 10 and this is in uh i think this is in inches so it can be a bit confusing but i've tried out different values and this is just what looks nice so but anyways the number of images are going to be data shape uh zero uh i guess this is specific to sort of a specific batch size uh the the fig size that we're using here and what i've tried for is batch 32 you can also use a larger batch size and it's still going to look relatively okay what i'm doing here is that i'm creating the number of images to be data shape of 0 which is you know a general format it's not specific to 32 then we're going to do size is integer of np dot seal of mp square root of num images and why we're doing this is uh because we want a nice looking grid a uh a square grid so the size of the grid um is going to be the square root of the number of images uh uh just sealed to the to the up one integer value and uh then what we're gonna do is we're gonna iterate for i in range of data shape of 0. so for each image in our data we're going to create a subplot for that one and we're going to the size of the subplot is just going to be size size and size just for the the number of images in total and the specific image right right now is going to be specified by this index which is i plus one and uh the title of this of this plot is going to be class names labels of so the labels are the the values of the index for that one and then we're doing class names of that uh index right there to get the name of it then we're going to do x ticks and we're just going to send in an empty list and we're going to do y takes and we're going to do similar and then we're going to do plot grid equals or plot grid false and then we're gonna do if data shape of three is one uh we're gonna do plotted and we're gonna do uh cmap is plt plt.cm.binary just to get uh grayscale right so if grayscale this is what we're going to do and uh if it's the if it's a rgb we're just going to do plot in show of data of i and then in the end right let's see right there we're going to return the figure and uh here we don't have to specify any cmap because rgb is by default so uh this is a lot of matlock lib and i don't want to go into the specifics of that but you get the general picture right here even if you're not too familiar with laptop label i would i would think uh we're creating a figure and then we're uh creating a subplot which are going to be this uh these specific training images and the size of it is uh size so the row the number of rows is size and it's th that is dependent on the number of images that we send in so let's say we send in 16 images then the grid is going to be 4x4 and each of those cells in that 4x4 grid is going to be an image from our training set so now we have the image grid and then we have plot to image so if we go back uh this should most likely work now so we're doing first image grid then we're doing plot two image and here we're writing every single image in our batch so that wouldn't really be necessary perhaps you want to do it every epoch or something but this could be useful just to check all of the the images so let's run it and let's see what it looks like all right and then uh right here we need to import it right so we need to do right here we need to do from utils import plot to image and then image grid and rerun it and hopefully it works all right writer as default is not defined so i think this should be rider dot as default all right so if we open up the tensorboard now we get it in a nice format get the labels printed above it and as you can see here this is a square grid uh so for example here it's a six by six and then these last ones aren't aren't covered by any image and that's because we did that seal and rounded it upwards but i would say this looks pretty good this is how this is a nice way to visualize the images and then you can see this step right here for each of the batches one thing here is that if you scroll back just one you're going to see that it's actually six steps and uh if you when you call tens when you in anaconda you call tensorboard you have another argument you can send in which is uh samples per plugin and that would specify how many images you want in this step slider right here so for example if you just specify it so if you specify that for example to to 94 then that would mean that one slide step right here is for one batch uh rather than jumping you know in this case about 25 batches so that's just one thing to keep in mind if you if you are wondering about that but anyways that's how we how we can visualize images now the next thing is uh we want to create this confusion matrix and uh this is also a little bit complicated but as you saw previously it's going to turn out pretty good so let's go through it step by step and again this is the starter code as usual how this is going to work is that normally you would create the confusion matrix by sending in all of the images uh at the same time with the or rather the outputs and the y labels at the same time uh and then that would create a confusion matrix but in this case we're creating it or rather we we want to update it as we do it for every batch right let's say we have an enormous data set we can't send it in at the same time so what we're going to do is we're going to create a confusion matrix that is initially empty so we're going to do empty zeros and then length of class names and then length of class names right that's going to be the the uh the number of rows and the number of columns and what we're going to do is we're going to update this uh throughout going through our batch so we're going to go through here and we're going to do confusion is plus equals a function get confusion matrix we're going to send in y y prediction and then the class names and that's going to then add up to that confusion matrix and we might get some numerical round off errors and so on but at least this approximation is uh is very good and it's going to be enough to to get a good representation of what the model is learning then after that batch we're going to do with train writer dot as default uh we're gonna do tf.summary.image and we're gonna send in we're gonna specify the name which is confusion matrix then we're gonna do a plot confusion matrix and again this is going to be another function and we're going to send in confusion and we're going to divide by the batch index because we're going to get a confusion matrix here at every update that is going to be have its own probabilities between 0 and 1. but because this is a linear operator then we can divide by the batch index and we will get an average of the probabilities for the entire um epoch of course you know you could get overflow by this since you're you're essentially let's say you're adding one every single match but to get overflow you would need to have an enormously large data set um you know even if you have 100 000 samples uh having 100 000 is uh very far away from from overflowing um so that this is okay um just keep that in mind though and then we have class names uh and then we're gonna specify the step as the epoch so we're just plotting the confusion matrix every epoch because this can also be quite you know quite expensive all right and then we don't have to do it for the test set but of course you can do it for the test set if you want but i'm just going to remove this actually like this because we don't that's not really needed and we can save on some compute so what we want to do now is we want to go to the to the utils function and we want to create those two functions all right so um we're gonna first do get confusion matrix we're gonna send in y uh y labels we're gonna send in logit we're gonna send in class names so this right here are the correct labels this is just what the output from the model is and these are the class names so let's pass on that just quickly and then we also want to do define plot confusion matrix and here we want to send it as input some confusion matrix and i'm just going to write cm and we're also going to send in some class names so maybe first of all we want to actually create the confusion matrix and uh you know just to not do this from scratch we're going to use sklearn to produce the confusion matrix creating confusion matrix would be a separate video i guess but we would do the predictions are the numpy arg max of logits and then axis 1. so we're converting it to numpy here as well sort of going from numpy to tensorflow is pretty seamless um and so we just have to do numpy augments argmax directly and then that will be the predictions then we're going to do the confusion matrix is sklearn.metrics.confusionmatrix and we're going to send in the y labels we're going to send in the predictions and then we're going to send in labels as np arrange of length of class names and this is essentially to make sure that we're getting a confusion matrix which is length of class names in both the rows and the columns because let's say we have a batch size of 32 we would not perhaps get one example from each class every one and that would destroy our confusion matrix so we're sending this in just to make sure that it's actually the size we want and then we're just going to return the confusion matrix all right and then we want to do the plot confusion matrix and uh what we're going to do here is we're going to do size is the length of number of class names we're going to create a figure which is plt.figure and then fig size is size comma size we're going to do plot.imshow and we're going to do send in the confusion matrix we're going to do interpolation interpolation is nearest and you can read documentation for what this is exactly doing i'm not entirely sure what the different interpolation algorithms are but this seems to be a default one and then we're going to use cmap is plt.cm.blues so this is just going to make the sort of the color of it blue we're going to see exactly how it looks like later on but then we're going to plt.title we're going to specify the confusion matrix uh and then the indices are going to be npr range of length of class names all right and then we're going to do plot.x ticks we're going to send in the indices that's how many sort of x values how many how many ones we're going to have and then we can specify the names of them and we're going to do class names and then to have them look in a nice way and so that the text don't overlap we're going to specify rotation to be 45 degrees and then we're going to do y ticks and we're going to do indices and then class names all right and then we want to make sure that we normalize so we want to normalize confusion matrix and how we do that is we do cm is np dot around we're going to do cms type float and we're going to divide by cm.sum axis 1 and we're going to do mp.new axis and we're going to specify the number of decimals so this is just to make sure that everything adds up to one for the sort of the predictions for a specific one can only uh add up to one over all of the different labels um so that's what we're doing there um then we want to uh print or yeah so we want to have the text of the of the probability for that specific one so um we're going to first do threshold is cm.max and we're going to divide it by two and we're gonna do something uh pretty cool with this but you're going to see it in just a second so we're going to go through for i in range of size then we're going to do 4j in range of size right we need to go through every row and then we need to go through every column and we're going to do first color is white if the cm ij is greater than the threshold else black right so if it's over some threshold then we're going to make it white because then the the background color of that cell is going to be relatively dark so then we're going to make it white otherwise if it's lower then it means it's pretty light cell or a light colored cell sort of more white cell then we're going to make the text black and then uh we're going to do plot dot text we're going to specify the position as ij and then we're going to send in the actual value which is cmij and we're going to do horizontal uh alignment we're going to do center just to center the text and then we're going to do color equals color and that is the thing we specified over here all right and then uh after that we're going to plot tight layout and this is just to get some nice format and then we're going to do plot x label and we're going to do true label as the x values and the plot.y label is the predicted predicted label and similarly as we did before this is now a matplotlib figure to get it nice to tensorboard it needs to be a converted to a tensor so we can reuse the plot to image so we can do plot two image of that figure and then we can return cm image all right so now we can go back and hopefully this should now work so if we run this now this is not going to work actually right we need to import this stuff as well from utils import what was it called get confusion matrix and then plot confusion matrix and hopefully that will run now so let's see all right so as you can see here it seems to be working uh of course we have only run it for a single epoch now so our model is quite bad and this is represented of course in the values of the diagonal which is how much well the probability of it accurately uh classifying each specific class so in this case airplane about 50 and so on so i mean let's let's uh increase the number of epochs let's say to five and rerun it and then you'll see as this improves over the epochs [Music] all right so as you can see it's become better uh the model is still pretty bad we've we're using a very small model um and so you can still see that it's not perfect but anyways the visualization seems to work and let's now move on to do in a graph so this is going to be i guess a little bit simpler so um let's see here i'm actually not going to need anything of this so we're going to remove all of the data set stuff right here we're going to remove all of it so uh the thing we're actually gonna do is we're gonna do writer we're gonna do tf summary create file writer we're gonna specify logs and then we're gonna specify graph viz let's just call it that and then we're going to just define some function my function it's going to take x and y as input then i'm just going to turn tf.nn.relu and then tf.matmul and then x and y right so it's just doing matrix multiply and then taking the relative that and uh we're going to do x is t f random uniform shape three by three and y is just similarly t of random uniform 3x3 as well and then we have to create um we need to do at tf.function so since tensorflow 2.0 eager mode is is executed by default but creating these graphs is actually not possible to do in eager mode so we need to make sure that it's uh run in a static graph and so this at tf function make sure that the function is converted and run in a static graph and when that is done we're going to do tf.summary.trace on and we're going to graph true profiler true then we're going to do out is my function of that x and y that we just send in then we're going to do with writer as default default i'm going to tf summary.trace export and we're going to do name we're just going to call it function trace we're going to do step equal zero because we're just doing it once and then profiler out directory is going to be logs and then we can't do it in this way so we need to do it like this um just a weird thing that we need to do and then graph visualization um and let's see we also need to do it like this if i recall correctly and that should work now if you of course if you want to visualize a model this is still going to work so i mean if you have a model just to return model of x here so just create your model here using cara sequential and so on and then just do return model of x and that should still work um what i want to do here is just show you a very minimal simple example but of course this this can be used in general so if we run this and then go to tensorboard or let's see that it's no error first of all yeah so we get some warnings here and i'm not really sure what to do about him i mean sometimes in tensorflow you just get errors but if stuff works like it should then i don't know i don't know what to do about this anyways we're just going to look at the graph and it's going to look no dataset named train all right and then we have this x and y matrix multiply relu and then that's the output um sometimes these graphs can look a little bit weird like i'm not sure why we have two identity right here um but yeah so anyways you can see sort of the the most relevant parts here that we're doing this and then relu and uh personally i haven't used this too much but it can be useful i i'd imagine if you want to debug some very complex model but anyways let's move back back to the code so we're gonna want to do uh hyper parameters now and this is something that's very useful i would say if you want to do hyper parameter search and you get it visualized very nicely so it looks very good um and yeah so we're gonna again just work from the starter code and we're gonna modify it and the thing we're gonna modify now is we're going to create sort of a function and we're going to call it train model one epoch and then we're going to send in something called h params and actually we need to import that first of all so what we need to import is from tensorboard.plugins plugins.h params import api as hp that's just for the h params api and we're going to import it as hp and so let's move back down to that function and what we're going to do here is uh let's just say in our model right here we're gonna copy in that model and let's say we want to sort of play around with when it comes to the model let's say we want to play around with this right here uh let's just call it number of units in the dense layer right here and of course you could imagine doing this for different values and then so on but i'm just going to use it that for this example and what i'm going to show you is uh how to get the h params from that and let's just say we want to play around with the dropout as well so wait let's remove this and drop out of some drop rate and then let's also say we want to play around with the learning rate so from this h params we're going to get units we're going to do h params and then we're going to call it hp num units we're going to get a drop rate and of course we haven't defined these but we're going to do it later on so we're going to get hp dropout and then we're also going to get learning rate and that is keras optimizers.adam the learn rate will be set to learning rate and we need to get the learning rate as well and that is from h params hp learning rate all right so now that we have those we want we have the the oh wait that's not flowing rate optimizer that's the optimizer and we're using the learning rate that we got from h params and then we're creating the model using this unit and this drop rate then we're gonna actually create one training epoch and we're going to do it with this so we're going to copy that and put it over here since this function oh not this one this one train model one epoch this is going to train it for one epoch so we can paste that and that should be it um then we're gonna want to write to to tensorboard so write to tb we're going to do that by specifying the run directory and since we're doing many different runs with different values of the learning rate and drop rate and so on we're going to first specify logs and then train and then we're going to do a plus string of units then we're going to do units so this is just going to be an integer then we're going to be units and we're going to understood underscore just to make it readable then we're going to do string of drop rate and then we're going to do drop out then we're going to do string of learning rate and then we're going to do learning rate all right so this might look a little bit weird this is just not to get one line too long so this looks alright we can easily read it and then when we have the run directory which is going to be different depending on the units drop rate and the learning rate we're going to do with tf.summary.createfile.com writer of that run directory we're going to do as default so notice here that we're not cr we're not using a train right here we're creating the file writer as we sort of write it um and that's so that's going to be different for everyone and then we're going to h params the of h params and we're going to do accuracy is accuracymetric.result so this right here is just going to record the values that was used for this run and then we have the accuracy and then we have tf summary.scalar accuracy and then we're going to send in that accuracy and we're just going to do step one and then after that we've got to do reset the states of the accuracy metric all right now we can i guess remove this get model because we created the model for every run then we uh we don't have to have that we don't have to have the number of epoc or yeah we could we could have the number of epochs but i'm actually gonna just run it for a single epoch anyways so i'm gonna remove that then we want to have the loss we want to have the optimizer we want to have the accuracy metric and then we don't need those because we're writing it already using a new writer every time but then we're going to specify hp num units we're going to do hp dot h per ram we're going to do num units that's what we're going to call it then we're going to do hp dot discrete uh and we're going to send in 32 64 128. so what this means is that we're choosing the number of units in the in that particular dense layer to be 32 64 or 128. so we're kind of using yeah so we're kind of using a kind of a grid search here and if you are familiar with hyperparameter search you probably know that using a random search is much more is better and i'm not going to explain why but essentially using random search is better and of course what you would do here is that you would perhaps random sample the values uh first using from some maybe using random search and and then that's what you would put into these discrete values but anyways just for the illustration here i'm just using it in a very simple way in in discrete steps in a grid search manner and then we're going to hp dropout and we're going to do hp dot h param we're gonna send in dropout we're gonna do hp dot discrete and we're gonna specify 0.1 0.2 0.3 and 0.5 so let's say we have those four values and then we have the hp learning rate and let's do hp dot h per ram and let's call it learning rate and again we're going to hp that discrete and then one e minus three one e minus five and one e minus five uh four and five so anyways what you can also do here is that there's something called hp dot in interval and then i get i think it's real interval and then you can sample from those but yeah i don't really see the point of those really i would just sample from them myself and then find the values and then put that in as discrete values so i think this this would work well in general as well but there might be some use case that i'm not really familiar with so anyways when we have those we can change this right here we're not going to use the the normal training loop that we've used previously what we're rather going to do is we're going to do full learning rate in hp learning rate dot domain values and we're gonna do four units in hp num units dot domain values and also for rate in hp dropout dot domain values in this way we're just iterating through the learn rate units and the rate in these discrete values then um we're gonna run it just for a single epoch but otherwise you would do you know for epoch in range of the number of epochs right here um but we're just going to do it for one so then we're going to do h params is equal to a dictionary and then we're going to specify hp learning rate is equal to learning rate and what is wrong here um yeah it needs to be not equal to like this and then hp num units i'm going to specify that as units and hp dropout is just great all right and then we're going to call that function which is train model one epoch of h params so actually one thing here is that you couldn't you wouldn't do for epoch here uh because that would sort of reset this um yeah what you have to do then is modify this so you would have to add one loop in here i would imagine but uh anyways this works uh just to illustrate how how how you would do it for a single epoch with different hyper parameters this might take a while to run so i'm just gonna run it and hopefully we don't get any errors as i say that all right so we get the h params right here and i'm gonna just make this a little bit larger and you're gonna see how it updates as as it's getting more data points [Music] all right now we get all of these hyper parameters and that this might look a little bit confusing so what you could do uh see if i can do it you can just specify sort of uh some the top ones the top accuracies and in this case we can see that the top ones correspond to having high number of units right a larger model is always better basically with a learning rate of 0.01 so the highest of the ones we chose and then preferably with the lowest level of dropout and again this is because we're doing on it on the training set you would actually want to do it on the validation set to make sure that this is correct but again we just want to illustrate how this works in tensorboard so that's it for using the h params now we want to move on to doing it the projector and the projector is very cool and can be very useful so let's move to that file and uh is uh we actually don't need these things so i'm going to remove the model and i'm going to remove let's see pretty much everything here and uh we're not going to need anything really we're going to need one batch one x patch one y batch we can do that by using next iter of dstrain then we're gonna send that to some function which we're gonna create plot to projector we're gonna send in x patch and we're going to send in x batch again i'm going to explain this in a second and then we're going to send in y batch we're going to send in class names we're also going to send in some log directory we're just going to call it proj like that projector and so why we're sending in x patch two times here is because uh this function we're going to create is going to obviously take in some some data points right the images and it's going to take in some labels um and but it's also going to take in some feature vector so normally you would have sent in your your images your data to your model and then out from that model you would take some feature vector which might be in some arbitrary number of dimensions so then this projector is going to take in this feature vector and try to obtain a projection down to this to the images and see sort of what of the features um and use one of these algorithms like tsne or pca to project it down and see what what does our model sort of think are in similar groups depending on what the feature vector says and what we're going to do in this case is we're not going to send it into some model so what that means is we're going to use the images itself we're going to reshape the uh to just be one long feature vector with the pixel values for that for that image and that's going to be our feature vector so again as most of the stuff in this in this tutorial it's been for illustrating how to use tensorboard uh so this is how to make it simpler for us but normally this would be a feature vector so just to just be ultra clear you would def uh feature vector you would do model of x batch you would get some feature vector and here you would call in with that feature vector now uh we're not going to do that as i said so all we need to do is create this plot to projector and before i forget let's do from utils import plot to projector and then we're going to go to these utils functions again and we're going to need two things so first of all we're going to need something called a create sprite image and a sprite image is essentially that we're going to take the image the input or the x patch the input x patch and we're going to create a sprite image and a sprite image is that it's going to have one image with all of the images in that batch that we send in so let me see if i can bring up one example of that so a sprite image of mnist is going to or this is fashion mnist so this right here this is a sprite image as you can see it's a sort of a collection of of many many many different images inside that single one and uh that's going to be useful to send to tensorboard so that we just have to send one image rather than many and so that's what we need to do first of all and i'm not going to do this so i'm just going to steal it and i'm going to copy paste it and so this is stolen from this guy andrew b martin and yeah you can read through this we're just going to use it i'm not going to focus on what this does so but essentially it creates this sprite image and so basically it does some reshaping and some and some stuff and adds padding to get this image we're going to create this define plot to projector function and what we're going to send in is x that feature vector i was talking about uh y class names uh we're gonna specify some log directory as default let's just say uh default log directory and then uh let's see we're gonna also get meta file and we're gonna specify metadata.tsv all right so this is going to make a lot more sense for what we're going to use them for so first of all we need to sort of make sure that the number of x dimensions is 4 because what we're going to do is we're going to assume it's of this form where we have the batch size we have the height the width and then the channels and one thing that can be super frustrating is that uh you can't run this if you've already if you have a directory with this log directory already and so what we're going to do is uh we're going to do if os path is directory so if there is a directory in log direction that means that there's some old projector files there so what we're going to do then is we're going to do shuttle.remove3 a remove tree of that log directory and then we're going to make there make that directory of that log directory so essentially we're making it clean from scratch so we're gonna create uh a new clean fresh folder all right then we're gonna do i'm going to specify the sprites file and that's going to be os path join of log directory and then which is we're just going to call it sprites.png and we're going to do sprite is create sprite of x of input the input data then we're going to do cm v2 dot in write sprites file what sprites file and then the sprite all right then we want to generate the li the label names so we're going to do that by doing labels is class names of y of i so each respective integer target value for each training example so i is a training example and we're going to do for i in range of int of y shape of z of zero like that then we're going to write to that metadata.tsv file and that's used for plotting in the labels in the projector so we're going to do open os path.join log directory and that metadata meta file that's what we called it and then we're going to write to that one and we're going to open it as f so for label in labels we're going to f.right we're gonna do new line and we're gonna do dot format label all right so we're just gonna send in the label and then we're gonna make a new line and that's just the structure tensorboard wanted in and now what i'm gonna do is uh we're gonna do the feature vector so we're gonna if feature vector dot n dimension is not equal to two that would mean that um it's it's not a feature vector uh so that would mean that for example like we are doing it we're sending in x batch two times so if that is the case then we're gonna do print and we're going to do let's see note feature vector is not a form batch comma features and then we can do reshaping um to try and get it to this form and then what we're going to do is feature vector is equal to tf.reshape and then feature vector and then feature vector dot shape zero and then minus one all right and uh then we're gonna do feature vector is tf variable so we're just going to convert it to a tf variable and then we're going to do checkpoint is tftrain.checkpoint of embedding equals feature vector and then checkpoint.save and then os path join log directory and then we're going to specify embeddings.ckpt um so we're basically saving that feature vector into the log directory and then embeddings.ckpt that's what um tensorboard wants it to be called and then uh we're going to set up a config that's going to be right that's going to write to to the projector so we're going to config is projector dot projector config embedding is config.embeddings add and then we're gonna do embedding dot tensor name we're gonna specify embedding and then we need to do this right here we need to do dot attribute and then variable name variable value and then we need to do embedding dot metadata path we need to do meta file and uh yeah if you feel this is a little bit clumsy i mean the code is kind of there's a lot of code uh to do this um i wouldn't focus too much on the specific code in in pytorch for example this is done in one line literally one line but in tensorflow this is a little bit more difficult the embedding projector is usually done to project uh embeddings make sense right so uh when we're doing it for images uh we need to do it in this very convoluted way uh but anyways this is this is how we do it but if you find a better way then do comment but this is the best way that i've found so far then we're going to do embedding dot sprite dot image path and then we're going to do sprites.png so we're just specifying the image path and then we're doing embedding dot sprite single image dimension extend and then we need to specify x shape of one and an x shape of two so we're specifying the uh the height and the width and lastly we got to do projector visualize embeddings and we're going to send in log directory and then the config and that's that's it so uh now hopefully if there are no errors this should run and this is not right so uh the error is that we when we're normalizing the images when we're creating this sprite images we need to make sure that we don't divide by 255 and if we rerun it we should hopefully obtain the the correct pixel values now and as you can see here we have the images and we can see right there that's a dog and then that's a boat or a ship that's a horse i think and let's see what else uh yeah anyways you can play around with this we have very few images here right so what you would probably want to do is you would change the batch size to something like 500 and if we rerun it this is quite quick actually [Music] all right so what you need to do here i think if you rerun it you need to do sort of a new one so if we rerun it now and then we open it again in tensorboard sometimes this works sometimes it doesn't but now it works all right so now we get 500 images and then you could do you know use the tsne pca and this also i've tried to make it general in the way that we could just replace this by mnist and we could just run it directly and then if we now check it all right yeah then of course we can't do this data augmentation so let's remove the data augmentation completely but we can't convert something to grade scale that's already grayscale but other than that i think this should be general in the way that we can just change that to mnist and then you would be able to project any type that you like all right wait there's one more thing actually yeah so this is the last file of this one but i i want to show you the other thing too in tensorflow profiler so tensorflow profiler is available since tensorflow 2.2 and if you follow these tutorials you know that i'm using tensorflow 2.1 on the gpu and so i can't use it uh and it's actually necessary that you have a gpu to run so what i did is i did it in google collab and this is following very closely to one of their official uh official tutorials but so essentially i'm gonna just step through this and uh perhaps you're watching this video in the future when you have you know when you can easily install tensorflow 2.2 and above with with ana with anaconda and so you can probably do this in in create a just a new file so you wouldn't have to do this on google collab but this works just to illustrate all right so first of all we just have some imports right here uh and then we gotta do this pip install tensorboard plug-in profile uh we're importing tensorflow and here we're making sure that we're running on the gpu as i said that's a requirement and then uh tensorflow i think it works on the cpu but uh for some functionality there's a requirement that you do it on the gpu so then we're doing tensorflow datasets we're importing mnist very similar to what you've to what you have seen throughout this tutorial then we're normalizing the image doing some map and then batch then we create some very simple model with the model compile and then we create some logs and they create these logs this is from tensorflow official tutorial using this date time now which is kind of clever so that you would get a new file new log file every time you run it because the time has changed so that's a pretty good way to avoid those errors instead of doing in the projector way of removing the folder if it exists then doing something like this could could be easier i think all right so then we have tensorblow tensorboard callback and we're just creating that and all they're doing is doing this profile batch and doing it 500 to 520 and uh so this is using it with tensor tensorboard callbacks but there are multiple ways you can do it uh if you have custom loops as well i'm gonna reference to that official tutorial in the description of this video if you want to check out more how you would do it and then we're just doing model that fit and again using this tensorboard callback and then if we then do load extension of tensorboard so we open up tensorboard and we run it with logier equals logs then uh we can get all of these different things so we get an overview page maybe i need to rerun this i think so so we're gonna factory reset runtime and then we're gonna run all so then we can open up the profiler and uh what we'll get is uh we have some different tools here we have an overview page and to be honest i haven't looked into these in detail but i'll show you just an overview of what they do so this is the one that's perhaps the easiest to look at and easiest to sort of understand here they give a summary of all the different parts so you can see the timing for all of the different components of the training you could also see different things um the placement so let's see a number of tf ops executed on the host and the device not really sure what the difference between hosts and devices here so again i'm no expert in tensor profiler but maybe someone can comment what this exactly this this difference is but other things you can see here is like the device compute precision uh if you're using 16b 16 bit how many percentage of the computation is done on 16 bit and then you can see some recommendations here and this graph is quite useful as well so you can see for example the input is highly input bound uh 85 percent is waiting for input uh so what we could do for example is uh now that we know that we could do dot dot cache what could we do else we could do num parallel calls is auto tune and we could do that for both so num parallel calls is auto tune and then we can do a dot cache and then we could also do dot prefetch and we can do auto tune and dot pre prefetch prefetch and we could again send in auto-tune and uh yeah i guess i'll just re-run this again so run time run all and uh we'll see if there's any difference in the uh in the output from the tensorflow profiler oh by the way uh this profile batch is uh between which batch and this is that you want to profile so right here we want to profile between the index batch of 500 to index batch of 520 so we're just doing the profile for 20 batches so if you do it for too many batches you can get out of memory errors so that's why they recommend doing just uh 20 or 10 batches and also not to put it in the initial batches rather just to do it in the middle of training because it could take some time to initialize things all right so let's again look at the profiler and see if there's any difference so right now we can see that your program is moderately input bound because 16 is waiting for input but that's a major improvement over the the 85 percent that we just had um so let's see what more you have um actually you know what you can uh play around with this yourself and you can see um and i'm also gonna reference to a tensorflow official tutorial where they go into more depth on this tensorflow profiler um but yeah so for example here you can see sort of the matrix multiply that is taking up 25 the biggest chunk of the operation time and then um you have different things like trace viewer and that's you can get a very very low level view of all of the operations done on the cpu and gpu and and you can see exactly uh what is taking up time so this is a very very cool tool uh this is more of an introductory to that and just show you how to get it up and running and uh yeah so that's it this is going to be a very long video i can see that and hopefully this video is at least useful in understanding tensor tensorboard and uh yeah thank you so much for watching the video and i hope to see you in the next one you
Info
Channel: Aladdin Persson
Views: 52,951
Rating: undefined out of 5
Keywords: TensorFlow Tensorboard Tutorial, Tensorboard loss plot, tensorboard scalars tab plot, tensorboard callback, tensorboard projector tab embedding, tensorboard profiler, tensorboard graph, tensorboard hparams
Id: k7KfYXXrOj0
Channel Id: undefined
Length: 82min 54sec (4974 seconds)
Published: Tue Sep 08 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.