Optimizing Neural Network Structures with Keras-Tuner

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Dec 22 2019 🗫︎ replies
Captions
what is going on everybody and welcome to a tutorial slash showcasing of the carats tuner package so one of the most common questions I get on deep learning tutorials and content in general is people asking how did you know to do n number of layers or why and neurons or why did you do drop out why did that degree why badger norm all these things like why did you do that and the answer to that question has always been trial and error and anybody who tries to tell you they knew what model neural network was going to work is a dirty liar it's trial and error now of course there's sometimes some tasks like m missed for example a paper bag could solve em miss to ninety percent accuracy or more so obviously there are some problems that are just so simple any network will do and then there's some neural networks that will solve most problems like especially like image problems stuff like that but for real world problems that aren't solved yet the solution is trial and error and this has historically involved me writing for loops to solve it and then in that for loop I just tweak things and then I run that over night and then I save like validation accuracy or loss or both and then in the morning I just see okay these these are the attributes like three layers at 64 nodes per layer that seems to be the thing and then you know now I'll test with batch norm because every time you change one little thing like drop out or not 50% 20% 10% batch norm or not and all those things you got to keep testing so anyway it's historically just been ugly for loops to be honest with you but recently I came across the Carrows tuner package which does everything I was doing but better and it does a few other things that I wasn't even doing which is pretty cool so I thought I would share it with you guys basically the crux of it is you've got a model and then you can define little hyper parameter objects inside that model and then you give a you create this tuner object and then it changes those hyper parameters and the the hyper parameters you can specify you don't have to not everything has to be a tunable parameter but the ones that you do are you can do anything like an int float you can do choice abou lien and the instant floats are like a range with a step size so anyways that's pretty cool to get it you just pip install chaos - tuner also I'm using Karis tuner one point 0.0 now as a tutorial person this throws up tons of red flags for me but I'm still going to do it anyways but one point 0.0 means one of two things either will never get updated again which would be sad or - is going to be updated a ton in the next year and this tutorial could be rendered out-of-date very quickly if you're hitting errors check the comment section google the errors or you can pip install chaos tuner the exact version that I'm using so once you have that you're ready to get started but first a quick shout out to the sponsor and supporter of this video kite which is an ml and AI based autocomplete engine for Python and it works in like all the major editor is sublime text vs code PyCharm atom spider vim but why Oh anyway all the major editors and it's actually a good autocomplete I honestly I hate autocomplete so when they reached out and were like interested in becoming a partner I was like we'll see but it's actually pretty good it took me a little bit to realize how good it was and in fact I used it I use it for about a hundred hours now and the real test was when I removed it like I just disabled it and just to see what the difference was I was like oh wow there's so many big differences the biggest thing is like the the autocomplete it's not just variables it's like methods and even like snippets of code and like the imports are so much nicer huge differences I'll show some of them here but I highly encourage you guys I'll put a link in the description for you guys to check it out it's super cool and it comes with kite copilot as well which is kind of like a standalone app if you want it anyways and basically it's like live updating documentation for whatever you're working on which again is super useful so yeah it's it's a really exciting sponsorship I don't take like any sponsorships if you haven't noticed like I don't do VPNs mobile games all that stuff because it just doesn't really make any sense but this is actually a really cool product so I'm excited for them to be supporting the free education and I definitely suggest you check them out it's the the autocomplete is the best so okay cool so let's go ahead and and get started so first we need a data set and we need a model there's like so many things that we need just to even get started with Cara's tuner so I'm gonna try to like truncate this as much as possible but first we need a data set the data set that we're gonna use is fashion m missed so it's like M nest only like I said M this is like too easy so it's kind of no good for showing this so we're gonna use fashion on this so I'm gonna say from tensor float Karos dot data sets we're going to import and it can't just as a example here are all the data sets fashion amnesty so then fashion M mist dot load data is what we need to load the data and then just to shell out completely a kite copilot come down here it returns this right here so I'm actually just gonna copy and pasta this since most people probably aren't running kite copilot right now you can go to the text-based version in this tutorial or download kite real quick anyway so here's all your data and I think I'm just going to show like a quick visual example of the data and then we'll probably just copy and paste a model honestly I don't want always too much time so first let's just quickly display one of the data so we're gonna say import matplotlib dot pie plot as PLT lovely autocomplete thank you very much PLT dot m show no it tried to help me in show and then we'll do x x train will go with the zeroeth and then p ltd show and then i'm also going to do a c map c map equals gray here only because it's gonna be all colorful if i don't and people are gonna be like wow that's loud meat that people are going to be kind of confused so i'm gonna say three 7k tuner tutorial so this is kind of the data so again it's a 28 by 28 like Amnesty it's black and white it's got ten classifications it's just it's articles of clothing rather than handwritten digits so in this case it's like a boot or a shoe of some kind I don't really know what that is but okay uh so let's do a different one let's see hopefully we can get something a little more recognizable possibly okay so some sort of short sweater thing okay so that's what we're dealing with so it's just a little more challenging to get like 98% accuracy as compared to like M NIST so this is a thing that we can practice tuning on M Ness just doesn't work so so once we have this we actually need a model now again I don't really see any benefit in this tutorial for me to like write out this entire model for you guys it would just be a waste of time so I'm actually just going to copy and paste I'll put I'll either copy and paste this into the description or I'll put it just or you can just go to the text based version of the tutorial but this should all be except for this HP here this should all be totally understandable by oh you guys there should be nothing here that is like what what's that so and then so hyper print or this HP stands for Hyper parameters which is what we're going to use to go through this is a comment that I made for the actual tutorial so this just creates our model object returns the model pretty simple so from here you could define the model and do a dot fit for examples so the other thing that we need to do is import so let's go ahead and from tensor float are actually from tensor flow we're gonna import Karos then we need to import all that layer information so from tensor flow dönitz chaos layers import all of these basically we're gonna need every single one so dense actually we're not using dropout but calm 2d yes so we got we got activation we've got flatten and we've got some max pooling cool done ran over my face a little bit again there is the text-based version of this tutorial and again I'm really just trying to run through this there shouldn't really be anything confusing to you anybody here so once we have this you know real simply we can test this I just want to make sure it works so I'm gonna say model equals what we call that build model yeah so then from here we should be able to say like model dot fit yet again cool thing from kite is like this like entire snippet boom done for you and then you can kind of tab through it it so for example batch size let's go with 64 epochs so then again I'm just tabbing through that which is pretty cool validation data is going to be a tuple of information in this case what we're gonna do we're gonna say X train Y train so X train or X test rather oh it tabbed me over X test Y test and then actually in this case I don't care about verbose I had it there because I was building this tutorial in a notebook first so okay so then we can do that oh we need to have our X data so X train and then this will be our Y train and we need to reshape the data before it complains to us extra equals x trained reshape and it's negative 128 21 again if this is confusing to you I strongly encourage you to go back to one of the deep learning playlists okay so that should all work I do want to test it real quick to make sure it works before we get into the actual tuning aspect of things so I'm gonna save run that again please work oh it's because we have that HP there let me just move there real quick put this back whoa okay so it's training we can actually see it's already working we could just go to one epoch so for example after one epoch is actually not that accurate like 77% not much but whatever you got I can just tell you it's not above 90% okay so the question would be okay what do we do from here look can we actually do better than this model so the you know the fashion eminence data set is still easily learn about what's not easy is being 99% accurate as opposed to like M inist so so what you would do from this point like if you're me and you're trying to compete in a cago competition is now you're just gonna start changing stuff but what we can do with the Carrows tuner package is we can like automate that entire process so what I'd like to do now is I'm gonna move that over and we're going to import a couple of things again I'm gonna try to keep this super short so I'm gonna as short as possible anyway I'm gonna just copy and paste that in so from cara's tuner tuners I'm gonna import random search and then from Charis tune or engine hyper parameters we're just importing hyper parameters and again these are the things that allow us to do like the the int the float in ranges choice boo lean and I want to say there's another one but I forget it but anyway again text-based version has links to like all the docs I'm just trying to show you a general idea of how all this works the next thing I want to do is I'm going to import time and then we're gonna specify logged or equals and for now yeah eventually this is kind of a silly one but anyway this is a stupid F string to use but whatever time time time for now the log dur will just be a timestamp but you could add other things to this if you wanted but again trying to keep it short and simple so okay so build model now we will pass the hyper parameters and so far nothing is actually unique here so the next thing that we're going to do is I'm going to come down to the very bottom here and I'm gonna comment out these two things because this no longer will even worked and in fact I'm just gonna get rid of them because that build model is no longer functional in the way that we have as long as we're gonna pass hyper parameter there so so we're gonna first you're gonna specify the actual tuner that we intend to use and that's gonna be a random search and here we're going to pass a bunch of things first is the actual function that we intend to use build model no parms they're the random search object here will automatically do the hyper parameter stuff for us so you just pass the the function name here for your model then we've got objective this is the thing that we're interested in tracking so in our case Val accuracy that's what I'm interested in then we're gonna say max trials I'm gonna set this to four now I don't know for now one we have no we have no dynamism here so it doesn't matter so I'm gonna set that to one I'll explain that in a moment and then we're gonna say executions whoops executions underscore per trial equals one and then we just give it the directory so directory equals log door cool so so max trials and execution fort Rozz so trials is how many so like things can get you think options can explode very quickly when we're allowing ourselves to have a dynamic number of layers a dynamic number of nodes per layer or features per layer and then a bunch of boolean operations like do we want drop out and then if we do what range of drop right this can explode to like thousands or even millions of combinations pretty quick like with with a two layer confident pretty hard to get to millions but it can get to a huge number like a thousand different combinations of models is not going to be very hard so here when we say max trials this is just how many random pickings do we want to have and then executions per trial this is what you would like if you're just trying to search for a model that learns at all then I keep this 2-1 if you're trying to eke out the best performance I would set this to like three or five or maybe even more as long as you're like shooting in the dark just trying to find something that works or just a general idea of what seems to work I would keep this low but the point of this is is each dynamic version you're gonna train it this many times so if you've got some sort of you know when it randomly says hey let's do four layers at 64 nodes per layer it will train that one time but you might want to say two or three times because if you're trying to eke out like one percent accuracy given two different runs from the exact same model you might find that you get more than a difference that's higher than one percent so if you're really trying to eke out performance one you might be seeking a model that has a smaller standard deviation of validation accuracy but also you might want to run it a few times to figure out what the true average is because you might have just gotten lucky or maybe you got unlucky or whatever so hopefully that makes sense if that doesn't make sense feel free to ask in a comment below so that's our tuner object and then now what we're going to do is actually do a tuner search so tuner dot search and in this case we're going to specify our X data which will be X train and then our Y data will be Y train and I think I'll go ahead and tab that over actually I suppose if we want to be all I think it's correct pepé to remove those X Y and then we need a pox so how many pox do we want to train every single time again this really depends on your needs I'm gonna set it to one just because to iterate over this stuff it it still is going to take a while and in fact we're probably not even gonna do it but locally I'm just suggesting maybe start with one batch size again this is going to vary by you know what you're running this on I'm gonna say 64 and then validation data that I did I did dang it I gotta type it validation data equals been so spoiled I haven't really had to type too much ex test and then why test okay so now what this will do is search given these parameters right and then what kind of stuff is it going to search it's going to based on this information here we're only gonna want run one trial again I'm going to just quickly run this to do a bug check and then we'll actually start making the model truly dynamic so okay cool runs good so now what we're gonna do is come to our actual model and start adding things that make it dynamic so first of all so for our input layer let's say we take 32 and we want 32 - yeah sure maybe be 32 but maybe we want it to be anything from 32 to 256 but we don't want to say like 32 33 34 right we want to have a step size there so the way that we do that is HP capital I int and then we're going to give it a name and so I'm going to say input input units not shape and then it's the starting value max value so 32 to 256 and then the step size I'm gonna say 32 so and in fact I'll just I'll just add the actual names to this one I'm gonna do it every time but just to make it super clear min value max value max value and then this is our step so I'm just making sure it's on screen I'm not covering it so now the input units is a dynamic value that will be randomly chosen but somewhere in that range okay so yeah look good so again I'm gonna save that really quickly and just make sure we run no bugs cool so in theory that is with a different you know who knows well I don't know which one it was in that case the tuner does save all that information I'm just again bug checking at this stage so random search search and so it's okay we've got a unique or a dynamic number of for the input to this this confident and then we can come down here and then so the input layer is a little unique because it has an input shape that we need to make sure we retained but then the subsequent model or the subsequent layers can are all basically the same and in fact I'm gonna remove max pooling because that's gonna cause trouble for us we're gonna run out of things to pull if we in some cases so I'm gonna remove that and now we're gonna say is we're going to for I in range and again we're gonna pick HP int and then the name in this case will be in underscore layers and we will go from one to four and since we want the step size to be one we don't need to input anything more so from here we'll do that so now model addcom 2d and then again what if we want this 32 again to be unique so I'm gonna take HP int here copy that paste that over that 32 rather than input units I'm actually going to say underscore I will make this an F string and I'm gonna say calm I units and again 32 256 cool so looks good to me okay I think that's good we can go ahead and save that let's come down to here go ahead and run that make sure it runs it does ok so I could continue to let that run and in fact it might let's see I think we said one epoch so hopefully at the end of at least one epoch it should give us a brief summary I would hope oh that's right we said so at the end of it ok so we can see fine in this case we had a neural network so that input confident was so the input units was 32 then we had two conflicts it's a little bit out of order actually but the input unit was here right so we actually had a 32 so it was 32 by 96 by 32 not by but each layer so the input layer is 32 comp like that coms net has 32 features and then it feeds into a 96 features then the 32 features so it says n layers is 2 but it's actually 3 it's plus 1 more it's just n layers is the answer to this for loop here here right so the actual number of layers and layers plus 1 I hope that makes sense coming back down here so then this tells us what the score is it also tells us what the best step before that the score was which is useful but we only did one epoch anyways ok so we get the information but of course this was relatively meaningless because we we only only tested one thing so so the next thing to do would be to make this a much much bigger test but each test takes this is 19 seconds so and it's gonna vary depending on how many layers you do and all that so it's just going to vary what I've done is I went ahead and created I just went ahead and saved it as a pickle object so I took tuner at the end you have you get this tuner object so at this stage all I did was imported pickle I saved the tuner object to a pickle thought I had saved it but I made a mistake so I ended up just retraining everything so here I am like an hour later I've got my pickle I've got my directory and just really quickly show you inside this directory and then inside a untitled project we have all of the trials that ran and then for each of the trials we have a checkpoint which obviously is the model and then the trial dot JSON file contains obviously your ID but then the hyper parameter is more importantly and the score that that model got so even if you didn't have the pickle you can still go back in time and and get this stuff but anyways it would be kind of challenging and and tedious and annoying especially if you've got many iterations per trial you'd have to pair up I guess by exact hyper parameters I don't really know anyway it would be annoying so it's nice to have the the pickles saved instead just to come back later so so yeah so cool and so you have that and then also I want to say here you've got is it the Oracle I think it's inside the Oracle that has the best information it's definitely not 2 nor 0 but you can also interact with the tuner object which is why I saved it as a pickle so first of all we can get the best hyper parameters right out of the gate just by running that and in fact before I do that let me let me comment out this this and then we don't want to do the search again so I'm gonna comment that out cool ok so let's just run that real quick and we get here like the best hyper parameters now I don't really find that very easy to read but then the next thing that is a little more useful to me is the results summary so this should give us the top I think by default top 10 models so here they are and as we can see here the top 10 got basically the best was 87 but it's anything from 86 to 87 it's you know within what 5% accuracy it looks like 6 so pretty pretty close there and also 87 is the very best interestingly enough and the other one that I trained that I lost it was the best was eighty nine point six so I do think we could for sure find something more than 90 and if you run this it's totally possible that you'll find some that are better than 90 and the reason for this difference is even though it seems like we made very minimal things to make dynamic we actually did a lot and so so to actually run through all those I don't actually totally even know so you could have up to four and then four layers of each unique convolutional layers and then of all of those combinations times all the possible combinations here I mean it would just be a huge number it's gotta be hundreds or even in the thousands I don't know so on someone go ahead and do the math if you want anyways it's a lot so there's a lot of combinations but that just goes to show that I already know that somehow with this exact code you can get actually 89% it's just a function of running through all the combinations so again it's very nice if something can do it just automatically for you so finally the last thing I will show is that you can actually from here get the actual model so a tensor flow model so print tuner not get best models and then I'm doing a dot summary here but this is an actual model so you could actually do a dot predict as well I'm just gonna do a dot summary and just so you can see it I think this is the easiest way like the hyper parameters is still kind of hard to be like okay but how would I build this model again whereas a dot summary really just makes a lot of sense to me so anyway I'll just do that really quick but mostly it's just to show you it's an actual tensor flow model so as we can see you got the input the input layer activation pooling right so you just build this model exactly to this to these specs so okay I think that's enough information hopefully you guys have enjoyed like I said this is like one of the most common questions I get the answer is probably not as intriguing as you maybe would have hoped but hopefully Kara's tuner can actually make your lives easier if you have questions comments concerns whatever feel free to leave them below and again shout out to kite if you want to try them out they are it's a totally free plugin and like I said it's it's pretty awesome like I'm really enjoying it so hopefully you guys will enjoy it as well and thanks to them for supporting the video that's it I will see you guys in another video
Info
Channel: sentdex
Views: 69,689
Rating: 4.958549 out of 5
Keywords: python, keras, keras-tuner
Id: vvC15l4CY1Q
Channel Id: undefined
Length: 28min 26sec (1706 seconds)
Published: Sat Dec 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.