Zero to Hero with Keras

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

This is a talk I gave at Deep Learning NYC Meetup. I think it's a good introduction to ML and how to get started. Please comment :)

👍︎︎ 2 👤︎︎ u/yboris 📅︎︎ Jul 14 2018 🗫︎ replies

Jumped around enough to get a sense of flow - looked really good - guess this is tonights movie :-)

👍︎︎ 2 👤︎︎ u/DeepDreamNet 📅︎︎ Jul 14 2018 🗫︎ replies

Captions

all right I think we're basically ready so first of all thank you all for coming here today I'm really thrilled to be here to share what I think is really really exciting and as just to understand what my audiences like show of hands how many of you have never done machine learning yet how many of you have already tried something with machine learning at least something okay how many of you have done work with Kerris specifically and how much you how many of you are thinking you're sort of intermediate and above in machine learning okay so full disclosure I am in some ways a beginner but I have some background and teaching so I think I can present things quite well I have read quite a lot over the past months and I feel like I have enough that I could share that will be a value to some of you at least some of you so I'll try to share something about it okay so my talk is titled zero to hero and machine learning with Kerris and one of the reasons I chose Kerris is because it was highly recommended to me it looks like it's one of the most popular framework libraries out there to use and there is a magnificent book about it that kind of gets you from nowhere to all of a sudden understanding how to do it and that's the one that I used so brief outline I'll tell you a little bit about machine learning how to get started with it I'll tell you a little bit about the workflow tools that you can use some examples and we'll have some time for Q&A a bit about me I was a math teacher for a while and I jumped ship to become a web developer and I was doing front-end and still am at forts working mostly with angular but I have the fortune of being able to learn as I go along and we have a data team at Forbes and so I've been learning quite a bit from there as well so my manager knew that I was interested in machine learning and highly recommended this book this book has been recommended in other areas so I thought I may as well do it and I think it is magnificent I haven't read any other books so it's hard to compare but what I can tell you is the presentation is very very clear it goes through pretty much all the basics and gets you to understand how machine learning works and much of what I'll present to you today will be basically from that book all the code that I'll show to you as well will be examples from the book and a little bit of what I've done so what's machine learning I'm sure a lot of you already know but it is a really really powerful tool that is able to do more and more as we discover new techniques and ways to do things so some of you may have seen this before this is called deep fakes kind of a nice hashtag term for taking a video and then replacing the face and then if you remember the movie face off you know there's a Nicolas Cage who talked about taking the face off and what we have is Nicolas Cage this face off so machine learning is really really powerful and that gets me really excited because I feel like it's kind of like a big fun playground where you can do a lot of things but then of course there is financially lucrative applications as well so types of learning this isn't an exhaustive list but kind of just a general topology of it there's the supervised learning where you provide data to the Machine and you this is the correct answer and this is the data that you need to correlate it with there's the unsupervised learning where you give it maybe a large large amount of data and say try to find out patterns maybe subdivide this into meaningful regions so maybe Netflix could recommend this category to people there's self supervised learning and there's reinforcement learning reinforcement is the kind of thing where you can teach a machine to play an Atari game and then beat it consistently with perfect scores and reinforcement learning is kind of a we can think of genetic algorithms where you provide a goal and then whenever that thing succeeds we kind of make it do that thing again and roughly speaking but today what I'll be talking about is supervised learning there's a nice graph I'll be sharing the the slides to the stalk leader so you can just look at it and more in-depth this is not exhaustive in any way it's just a rough outline but you can think of machine learning as kind of this big category that breaks up into pieces there's the discovery there's the predicting and there's reinforcement learning we will talk about supervised learning which is basically predicting something from data that you've been given and the general idea is you need to have a lot of data you need to then have labels for that data saying this is the correct answer based on these variables being this way and if the variables are different the correct answer is this and I'll provide examples more speaking kind of abstractly there is a really beautiful visualization that maybe can help people get a sense of what's going on if you've not seen this before I really encourage you to play around with it on your own so I loaded this up this is a kind of an approximation of how things might work you have some data so here's a data set with you see circles you have some data which is something and just abstract doesn't matter but we know the correct labels the orange thingies and the blue thingies and what you want to do is you don't want to write a program that will you know where you manually instruct it what to do you just want the computer to figure it out and so what you do you provide some input and then you provide these neurons and then you say hey train this thing and after a while it figure stuff out so what you see is now it's somehow figured out that anything in this region is most likely to be blue and anything outside is most likely to be orange and it does that by using these layers these are layers of we call neurons and these neurons are basically you can think of the little units that will do some kind of processing and I'll describe what those are but you can add more layers and you can add more neurons and then depending on the kind of data you're trying to classify you might need a different architecture a different setup or maybe a different number of items and so let's train it again and alright after a while it just figures it out but then of course if you have very complicated data and you try to figure it out you might not ever actually find a very good solution it will maybe not do it so you'll have to play around with your model this is called the model you're building a model you're building a prediction machine and you'll have to maybe change something called activation maybe you'll change the learning rate and you will play around with these parameters maybe add more neurons maybe add more neurons here I don't know let's see if it gets anywhere and over time if you're lucky you'll get an answer but that's kind of a very high-level overview of a bit of machine learning so let's come back to the presentation machine learning has a lot of vocabulary so there are a lot of terms that would be thrown around and it's good to be familiar with them I'll basically go through these this isn't an exhaustive list but I think when you become when you become more familiar with these with this terminology and what it does you kind of get a better sense of what machine learning is about by the way I keep saying the phrase machine learning I think maybe a more appropriate term at the deep learner so machine learning you can think of it as a huge set of different approaches and techniques and deep learning is just a subset and that's actually what I'm talking about deep learning today so there are things called layers it's what you saw in the previous example you have a certain architecture with certain layers they're just exactly what the word sounds like then you have tensors tensor is you can think of it as a mathematical object basically it is a if you remember from from math there are matrices and vectors and scalars basically it is a box with numbers think of it like that so you might have a box with a single number in it so it's a scalar or a box with a row of numbers just it's a vector or if you have a two-dimensional thing you will have a matrix and then you can have more complicated objects they're three dimensional four dimensional many more dimensional and then the items inside might be tensions on itself or themselves so it's complicated object but even though a machine learning we talk about these neurons and layers you you shouldn't make the parallel too close to the brain it's not really what it does what we have is just a set of mathematical operations and we are feeding a lot of data through them and by using certain kinds of techniques who are making the those those little pieces do some kind of work that happens to predict some outcomes let's see so the next step is gradient descent when you are trying to solve a problem you are going to figure out what your goal is the the correct prediction is something and you are trying to get your model to predict that but when it predicts incorrectly you have to tell it that it is wrong and that distance from far how far away how how wrong it is you kind of give it that feedback and what you want to do is you want to minimize that error rate so far is this okay making sense and so that error rate as it goes down you could think of it as a kind of a kind of a valley inside a complicated terrain and the lower you get in that valley the lower that error rate is by the way that error rate is called loss so machine learning gets to go down this valley and find the lowest possible space it was a smallest possible place in it and of course the valley can be a lot more complicated and the closer you get to the bottom the lower your loss the better your predictions so gradient descent is basically this technique for figuring out the answers now we can talk about optimizers it's a word you'll hear optimizers are the way that you're going to go down this valley and so these are pictures that are a little bit tricky maybe to figure out what's going on but if you remember topological maps this is kind of imagine the the the lighter color is the lower location and the darker blue color is the higher location so you are going down a hill and you're trying to end up in one of the two values and so there is another beautiful visualization on it online and these four different things that you will see are different algorithms different optimization techniques for getting to that value and so some of them might get down to the valley much quicker and some might be slower and so when you build your model you're going to have to make a lot of decisions about how you model is organized and how it's set up those decisions are called hyper parameters because there are things about the model that you decide whether they're good I buff so one of the decisions is about what kind of a algorithm for going down the valley you choose and some will yield results faster but maybe will error out more and if the valleys aren't nice like these two you might get completely different results based on what you what you choose so a lot of machine learning is actually you trying different approaches to solve the problems and figuring out which of them are better than others and you might just spend time figuring out which optimizer to use so I've got some optimizers loss the thing that I mentioned I perhaps I should have arranged these slides in the slightly different order but loss is like I said the amount that your model is predicting incorrectly so in these two cases you see on the left if the model predicted a line that is horizontal the distance between any of the points the actual outcomes the actual predictions that it should have made and the prediction that the model made is quite big and so the loss in this example would be quite high on the right what you see is a better model a model that fits the data a lot better and the loss is lower because the distance between the points the actual points and the predictions is smaller so in models you're trying to minimize loss and now another fun term activation function so every neuron that you're creating in these layers is going to be activated from the input so you feed it some data and that data is going to be provided to the input and the input might either sort of fire or not fire think of it like a neuron in the brain you know don't take the part of the analogy too far but basically get some input and now it needs to fire and of course because we have math we can fire at a different rate so maybe so on the left you see something called a sigmoid activation function so when the input is 0 the output is around 1/2 and when the input is close to 1 the output is 1 and when the input is negative 1 the output is close to zero and basically you have the smooth gradient between how much this neuron will fire based on the input that it's getting and so that's another hyper parameter you're going to be choosing about the layers that you're going to be setting up it turns out that while this may be a pretty good accurate way to model things in the world it is computationally much more heavy than a simpler riilu riilu is the rectified linear unit I think the second you should be cap but anyway you can see it's a very very simple function it doesn't fire at all up to a certain point and then afterwards it just fires that the amount at which it's getting so there are different activations functions to choose in the beginning you don't have to worry about all of this too much you'll just use regular rmsprop for the optimizer you'll choose real utam for the activation function and you'll just try different things afterwards okay so models are meant to predict the world but there is this constant tension about overfitting this picture I hope it's quite demonstrative but I'll explain what's happening you have red blue red and blue dots and you're trying to figure out how to partition this shape into meaningful regions and if you draw a straight line you're going to make quite a few errors it's still better than in random but you'll make some errors you can do better by drawing maybe a parabola shape but if you want you can get every single answer right but then what do you think would be the problem if you make this model over v as you see on the right the fan when you give it new examples it is more likely to classify them incorrectly than the model in the middle and this is typically going to happen when you train your models if you don't stop your model early enough it might start overfitting a lot if you think about models as that you know the amount of power they have is the more layers you put in the more powerful it would be you know that's kind of true but if you put a lot of layers and these layers are really really large they have a lot of processing power you'll just memorize your entire dataset they will know exactly the correct answer and be hundred percent accurate with your dataset but when you give it new data it will make some mistakes so to prevent overfitting there's a straightforward technique we use all the time and that is you have your training data and then you take a small fraction of it and you don't show it to your model at all that is while so you will train your model on let's say 90% of the data and you will have some kind of a model and then you say hey let's validate and see how it does on data that is never seen and you show it 10% of that data that you preserved you don't use that 10% to train your date your model at all you just see how well it behaves in fact you could do that you can monitor it as its training to see how well it does on the validation set and I'll show you some screenshots of what that looks like but basically if you are overfitting oh I have a picture left all right so what you have here is an example of what happened with a particular data set the dots are training loss so loss again is how different the predictions are from the actual and so you see loss going down and that's excellent that means your model is learning and is getting better and better at predicting things in your data set but what you'll also notice in this picture is the validation loss actually stagnates at some point meaning that while your model is getting better and better predicting your training data it actually stopped improving in predicting new data and of course if you give it completely new data set you know you you you might have even worse outcomes so there's that consistent tension between underfitting and overfitting and that's just kind of the art of machine learning my manager has been a great guide to me and he's been kind of giving you sage advice passed down I hope hope is useful to you too but he says currently it seems that a lot of machine learning is more like an art than a science a lot of it is experimenting trying different architectures trying different things and over time you begin to build some heuristics but that comes with practice and implementing a lot of different models so let's take stock of the vocabulary I've gone through most of it but I would love to elaborate if anybody has any questions or if I was unclear about anything if anybody has any questions about any of this or any other terms I've mentioned so let's take a look at some architectures the these aren't exhaustive but these three are really really large categories that is worth knowing about so there's the dense neural network and I'll have examples of all of them dense neural network or I can just call them dense it's one of them there's the convolutional neural network or CNN or conduct and another one that's large is recurrent neural networks aren't em and each one of these are more typically used for different kinds of problems so when you have a particular problem you also have to decide well which approach should I take and in some ways it may be quite easy so let's take a little bit a dense nut where it looks something like this you have an input layer with some exposed neurons so these are the actual piece of actual locations where where your input is going to be fed so if you are telling your model about the colors of certain pixels you will literally just say this pixel is represented by this particular neuron and whatever the value of that pixel it'll be converted to a number and fed into that neuron then because this is this is called dense for a reason every single neuron in subsequent layer is connected to every layer every every input layer every neuron in the input layer before and so you stack enough of these together and then eventually what you do is you bring it down to a certain few number of neurons in this case there's four in in in another case maybe you're classifying between two things you will have a one you want it to okay con nets I find these really really exciting so confidence what they do is their convolution filters if you've ever played in the Photoshop there is a convolution filter which is basically a little rectangle with numbers that says you take three by three pixels so let me use my mouse to show you okay so you take a certain section of an image and convolution nets are usually four images you take a certain section of an image and then you maybe summarize what that pixel is in some way and these convolution filters might be able to detect edges because what you're trying to do is find the contrast if there are any pixels on the left and on the right that are very different values and then you say oh that's a that's an edge now the good news is you won't have to program that in when you use con nets they actually will figure out what is relevant because of back propagation so you feed it a lot of data it gets compressed down into smaller and smaller and smaller representation but you can see the the depth of these blocks is getting longer and that's because these subsequent convolution filters are looking at a smaller image if you want to think of it that way so rather than looking by a thousand by a thousand pixel images maybe looking by a 10 by 10 grid but now it is trying to extract much more abstract ideas so the depth of it is also those tensors that are able to represent those extra features and then lastly you you stick a density neural network that will then take the output of those continents and classify it as a car track or van or whatever else it is so I mentioned back propagation only briefly but I hope you are familiar with it the basic idea is once the model is told good job you're right it will propagate the result back it will kind of go down the line it's saying hey whichever neuron fired and agreed with this statement please ramp it up and make it fire more the next time you see something like it roughly speaking okay so continets can be a very long dense it's very very long chain of layers this is an example of Google net and it was an image classification system which was very very powerful and roughly speaking in the very early stages oh yeah you're able to see in the very early stages of the chain it is able to pick up certain kinds of baby edges maybe color gradation differences later on in the chain is beginning to pick up certain textures this isn't something had to train it to do it just did it by itself because of the feedback because of the back propagation and then lastly it may be getting some really abstract ideas that may not in any way correlate with what we have in mind when you look at this picture you don't think a poodle but for a conman that might mean exactly a poodle so Kahn mats are really good for images because they are location-based they they know which section a certain filter is activated on and so here's an excellent beautiful simple example which just shows you how it might work I'm going to draw a number and what you see are these first layers that begin to detect certain kind of edges and then maybe they begin to extract something a little more abstract and at some point it's very hard to interpret but you know the machine is doing something and lastly you've got some dense layers which literally have no positional elements whatsoever they're just a row but the Machine correctly identifies it as you can see at the top that it's a five oh it didn't do it correctly so you know they do make mistakes okay so conned nets are awesome and they can do really powerful things there's something called style transfer where you use lower-level information from this convolution from ComNet that kind of picks up the textures and then you say keep whatever the higher-level objects there are like a horse but then apply that filter which is zebra like and you can convert between zebras and horses and so on and of course make it look like your favorite painter all right last architecture to briefly overview is the RNN recurrent neural network these are things like in your iPhone as you're typing it is predicting the next word you might be typing and what you do is you kind of feed the input back into itself it's kind of like a feedback loop and there's more than one way to organize these things one-to-one one-to-many and so on okay so let's get started with Karis sorry if this introduction was a bit too long it's hard to know what the audience is ahead of time I hope this was of use to the people who are just starting out all right well thank you thank you so getting started with Karis turns out to be pretty simple I won't go through the minute minut details of how to do some of these steps but I imagine that you know installing iPhone 3.6 or above is pretty straightforward for you developers you go online find out how to do it on the Mac I believe that Mac comes with version 2 points something already installed you can install version 3 point 6 or above side by side it won't interfere it should be fine you'll also want to install something called pip which is like a package manager if anybody is from the node you know nodejs type o thing it's it's the package.json file it's sorry pip is like the npm of a python it installs Python packages and then you'll want to install them the env virtual environment it is a tool that will allow you to go to a particular folder and set up the dependencies directly in that folder so that they don't interfere with anything else you're doing and so your regular Python installation should be safe while you're working inside your local environment I personally recommend a licensing Python so that you don't have to type it out Python three you just do pipe okay so now commands once you've done all that you go to whatever folder you have in there you're going to be telling Python so python 3 to create a folder called virtual environment the env you can play label that whatever you want I think that's the fourth word in here so you do PI dash M then and then whatever folder you want i don't just call it them and then you will activate that folder that is you'll tell your terminal that this is the environment you want to use so you'll say source and then the rest has been been activated if you've labeled that little subfolder differently it'll be different in the on windows there is a similar command I made a darker two for aesthetic reasons but it's hard to see it says vents scripts activate a path basically a similar command that tells your terminal to do things and lastly you're going to install whatever demented dependencies you have pip install - are requirements that txt that's basically all you'll need to have a fully working machine learning environment you'll just need a requirements that txt file that's something you can write on your own and it looks like this you tell Python little pip what to install in some of these lines you'll see equals equals and then some version number you can do that manually if you don't put a version number it'll just install the latest one so you'll definitely want to put in Karis and you'll want to put in tensorflow I haven't talked about tensorflow yet but it's just the general outline oh yeah I give it a like soon enough but tensorflow something you need I'll talk about in a moment you can also if you're using a desktop or a laptop that has an NVIDIA GPU you can use tensorflow GPU machine learning it can be ten times even 20 times faster if you have a GPU for convolutional filter for confidence it it really makes a huge difference it took this laptop something like I don't know an hour to to train five images to generate five images using a particular frame a particular approach on my home machine with the GPU I think it took five minutes or even less if there are a couple of other tools again this talk the slides will be available online so you don't have to take screenshots but there are a couple of other tools that you will use and maybe I'll mention them later - okay so workflow working with machine learning as Chris said a lot of it is going to be massaging your data but before you get to massaging your data you you're gonna have to define your problem you you'll figure out what it is that you want to solve and then you're going to need a huge data set the bigger your data said the better your model will be because more data better results and you'll have to choose a measure of success so this is about you know how accurate your prediction is compared to what you want and what is it that you want speaking kind of vaguely sorry I can't I can explain later somebody's interested I'm not a hundred percent sure exactly how to describe it so decide your evaluation protocol this is about you know when you're classifying things into buckets for example yeah I'm sorry I'm a little vague on this all right now prepare your data that's an easy one preparing your data is going to be hard work there's a thing called toggle how many of you have heard of Kegel it's an online community where people participate in these little competitions some of them even have financial rewards but they're really really good way to start to work with machine learning what you have is a lot of data provided for you and then there is a hidden validation set so that you can create a model and then you can submit it and then you will be told congratulations you predicted 80 percent of the data correctly or something of the sort so I tried out an entry one and this one I recommend to all of you it's called the Titanic set basically you have a lot of data that is about the people that were on the Titanic and then only two outcomes either they survived or they didn't so try to use their age their the amount they paid for a ticket and so on to predict that outcome but before you get there you're at least in my keys I think I spend 10 hours trying to figure out how to give this data to be represented in a way that the model would understand and that's just because of an experience you know my my mentor could have probably done in like 10 minutes or 20 but that is a big job you're going to have to figure out how to convert this data you also need to normalize your data so if you imagine that some of the variables only vary between 0 & 1 and another variables is varying between 0 and 10,000 that variable will overpower your model it will kind of be pushing all the buttons a lot more than the other so typically what we do is we normalize the data between values between 0 & 1 it's not necessary but it's typically going to improve your outcomes so preparing data is going to be quite a bit of work the good news is once you prepare your data it took me I don't know less than a minute to give the model to actually predict with like 50 or like 40 percent no 70 percent accuracy or solving anyway so a lot of it is going to be data cleaning and and then finally you come to iterate models the first model I had was crap and I iterated a bunch and it got a little better and you'll do the same okay so I'll share quite a bit about tools the big thing here is tensorflow it is a big library that comes from google it is really powerful and it does pretty much all the hard cool machine learning stuff you can interact with it directly writing your code directly in tensorflow and that's I think what a lot of researchers may be doing when they're trying to implement something really intricate and something that's different but there is a thing called Kerris and the author of Karis the same guy who wrote the book that I phrased earlier on Francois Charlotte he calls a basically Lego bricks of machine learning because what you have are these very high-level api's for interacting with either tensorflow or there are I think two other or maybe three other back and things you could use so you can use tensorflow directly but it's much much easier to work with carrots a lot of the kaggle competitions are one with terrace and a lot of researchers who work with machine learning do you scare us because it's so quick and easy to iterate models you can change a model with one line of code you can add 20 more layers with just adding 20 lines of code it is really powerful and I'll share a little bit about it and then data cleaning I don't know what data cleaning tools there are and ways to massage your data but this is a cool tool I've heard of from years ago it was Google refine and then it became open refine and this is kind of piecemeal screenshot that I put together from several things but what you see is data that you might get might not have consistent labeling maybe things will be misspelled or miscategorized google refine and I'm sure other tools like it will be able to notice that in this column a few things actually are misspelled and they'll say hey would you like to change it it will notice whether the data is maybe as a string rather than a number and it'll normal for you upon your acceptance so you will have to do quite a bit of that and this is one tool that you might use a very powerful tool in Python I don't know if you've heard of this is Jupiter Jupiter notebooks and it is magnificent it is a place where you can write your code and you can write annotations so that you can even share with your co-workers and say this is what I'm doing but one of the coolest things is you can run your code in a different order and after you run a bunch of code and you want to go back and change something you can just change that piece and rerun that piece and that may be very valuable you might have a very large notebook with a lot of steps that you're going through first step extract your data then run a script to clean up your data then run a script to maybe show what the data looks like so you get an understanding of what it is and then if the data doesn't look right you just change the way you format it rerun that piece to show the data again and once they're happy and now you start building your models you build your model it doesn't work well you just change that little bit and rerun that so you wouldn't have to rerun your entire script you just being able to rerun pieces so Jupiter notebooks is really awesome and it's very easy to work with basically you say Jupiter notebook in your command line and it comes up Jupiter notebook has been slowly transitioning to this new cool thing that's currently in beta it's called Jupiter lab it's basically the same thing but cooler and better somehow I don't think I don't know all the details so don't quote me on this but it looks like better UI maybe a little bit more things that it can do or at least in the future all right and lastly of the tools that I'll share today is tents report and this is really really powerful for monitoring your data when you're working with a little bit of data and a small model it might take seconds to build the whole titude to train the whole thing but you might be working with a data set of I don't know a few gigabytes of text which are processing and trying to figure out something and it might take several hours you don't want to wait for several hours to get that result back you want to look at it and say wait something has gone wrong and tensor board is it's from the same guys from tensor flow of course but Karis has a very simple way of interacting with it in Karis you can just output your loss that are already prepared for tensor board and then in tensor board you will live be able to see what's going on so in this particular screenshot you know I just found it online but it's indicative of what's going on here you don't maybe you don't see but it says loss and this is what's happening over time and over time what is time in terms of machine learning there's a word called people am i pronouncing dry I bought that word it means how many times the model has seen the entirety of your data set so in this particular example it looks like the model in order to get its loss down has seen the data set 240 times so after it saw the entire data set once we had back propagation that went through all the layers updating all the neurons to change their weights about how they're going to fire next time and then we ran through the model again and through the data set again and again and again and so over time in this case 240 times your your loss is going down okay and that's roughly it so I have actual codes that that's the that's the actual more important thing of it all so let's see some code caris is wonderful to get started because it provides in its API a lot of datasets that you could use right out of the box you don't have to try to massage it to make it ready for the models it's already there and all this code that I'll be showing you I have in a repository and by the way none of this is magic this code is pretty much the same thing as you would find in the book and the book comes with a free github repository so you could Lance my code you can let glanced at the code from France washer lid and he also has Jupiter notebooks that you can run which have annotations about what's going on so how does it work I have a hack that's basically so that it doesn't console.log some crap I don't like don't worry about that but you say from Charis we're going to import some data sets and the data set is going to be Road route the rails I don't know how to pronounce it the the news data set and it has a lot of text you have some parameters when you're importing it in this case we're going to import only 10,000 most common words it automatically is imported into training data train labels train test data and test labels we can print out to see what it looks like and how long it is and then we're going to print out a little bit of it let's just see what it looks like Python news hey okay cool all right now it's sort of training but what you see at the top is you have about 8,000 pieces of data you have about 2,000 pieces of test data but you will not give to your model to train you'll only give it to you only give it that to see how it's doing for yourself okay well it's training we'll see what else happens so data still needs to be massaged in some ways and so when your data is just a lot of words you need to somehow represent that to your model and models are really good at understanding numbers so what we do is something called one hot encoding one hat encoding goes something like this you have a huge vector that is a thousand or 10,000 items long and most of it is zeros but some of those values are 1 and if it is 1 that means that article contains that particular world maybe it'll have like number 2 if it's you know occurring twice I'm not quite sure about being in the implementation here so we convert it to 1 hot so you do a little bit of work I'm going through this a bit quicker because there are a lot of examples and it can always show more details in the end so now this comes to the fun part about Charis Charis is awesome you say hey Karis let's get some models let's get some layers the model we're going to build is a sequential one and sequential is basically all that I've shown you up to this point these are things that are strong one after the other layer after layer after layer there is no more complications Carris is able to hand a lot more complicated things you might just have to spend a little bit more time setting it up you might have a model that consists of two separate models taking your data and then doing slightly different things with it then the output of those two models is fed into yet a third model that will be the final kind of you know it's gonna take and summarize there'll be the results from there so the API is super easy to use you say hey model let's add a layer what kind of a layer is going to be a dense layer this layer will have 46 neurons will have a certain kind of activation and then in the first layer you have to tell it its shape it needs to know what is it expecting so it is expecting a 10,000 long vector I believe I'm very sorry I'm not a hundred percent sure on this but the cool thing about Kerris is that every subsequent layer will know what comes before it and you don't have to connect the pieces it'll just do it for you you just say hey let's have a bunch of neurons and have this activation let's have a bunch of neurons so let's have this activation so now this activation I haven't talked about it's called softmax softmax is instead of trying to classify Y seconds I might be misremembering could somebody fill in is softmax the one where you you put it into two categories or no sothe next I think is between before many once and many that's right yeah and if you want between two then there is cross entropy I think is that right binary binary cross entropy yeah so all this stuff you can look up caris has a I think a pretty good documentation and of course a very large community so you can ask and people will help out sorry that I'm a little wobbly about some of this but anyway so softmax what it'll do is it will then I would put a set of probabilities saying across these 46 categories which one is it going to be and it'll say maybe 20% chance it's this category finally you're going to compile your model and you're going to tell it it's optimizer so remember as you're walking down this terrain as you're trying to minimize the loss you have to say which technique are you going to use these optimizers have a lot of variables that you can also use it also change so these hyper parameters you can maybe give it something called momentum so as it's going down this valley it will have a certain kind of a speed imagine a big heavy ball rolling down a valley so if it encounters a little hill ahead of it it'll just kind of steamroll and keep going and so sometimes you might want momentum in there caris allows you to AB those hyper parameters very easily I don't know the syntax right here but it would just be a little bit more that you would type in there okay then there's loss categorical categorical cross entropy this is the one where you are putting things into categories hmm I'm sorry I just don't remember very fun splitting this into I think I'm splitting this because these are news articles I'm splitting them I think into 46 categories so it's maybe sports article or something else anyway you have some validation and the validation set is just the last thousand items the partial training set is the first thousand items and last step of it all is you say fit model fit this is what actually is going to perform the training you give it the input and you give it the labels so the correct answers if you don't have the correct answers this thing is not gonna do anything they'll say well how can I learn so you give the data you give it the correct answers epochs so here we're running it for 20 times meaning the model will see exactly 20 times the full data set and you saw it I'd run through it 20 times batch size is how many items will a model see before it goes and updates so I give it 516 articles and then it is going to make some computations and through back propagation update its model typically batch sizes I think are powers of two for better computation speed although I think you can definitely do it differently but I think typically their powers of two correct me from there and then there's validation data now validation data is not necessary but this is how I was able to then see a plot and I saw that the training training data is improving but the validation is kind of staying the same I believe the next picture we'll see is actually accuracy and so you can see the accuracy is going up and it's plateauing close to a hundred percent on the training data but our validation kind of hovers around eighty so what that means is you might be getting a little bit of overfitting you're not you're not getting any better over that with the training set so how would I change this model if I wanted more layers well today I got a bunch more layers caris is absolutely fantastic you can change the variables and now you have a completely new model one other thing that is really cool is model that's summary and what you will do what it'll do is it'll show you what the model looks like so I change the model I'm running it again I'm not using the the tools that I showed you like Jupiter notebook or the tensor board to do this but it's fine so you see you have an output of what the model looks like and you see trainable parameters so this model is actually trying to tune about 600 700 thousand knobs at the same time to get better predictions now the model is heavier there's more stuff going on so it'll take a little bit longer but in the end we'll see what kind of output we get if I was running tensor board I would be able to see it live exactly all that all the data as it's happening tensor board also allows you to see into the model a little bit better and then another cool thing with oh yeah so we have the model again it's improving but the validation data isn't improving so maybe I need to try a different approach listen there are a bunch of different things that I have replicated from the book there is convolutional Network for example continent continents are basically the best thing we have for vision you can solve this problem you're seeing this right you can solve this problem without convolutional filters just with deep neural networks but because deep neural networks when you're presenting this data are going to unwrap all that picks us into kind of a long line it doesn't have positional data so it's much harder for to learn if you're using convolutional filters though it will be a lot easier so Khan net amnesty amnesty is the data said that that is trained on handwritten digits so let's run nothing and what I find remarkable about machine learning is it's really really fast and if you set up your model right you will get amazingly accurate results so this particular model goes through a couple of steps first you have a convolution filter which goes again through every section of the image trying to extract useful images in this case the input shape is 26 by 26 because that's how many pixels we are working with 32 is kind of the depth it's the amount of information that your model is going to be able to extract think of it kind of like how many the directions if I'm like look at then you do max pooling which is kind of trying to compress that information into a smaller set and now the next layer is 11 by 11 by 64 so now the input is smaller it's no longer looking at 26 by 26 only by 11 by 11 and that's typical because calm convolution filters typically will summarize the data and then you are going to pull them together into even smaller pieces if you have a huge image you won't be able to train a model you know with in pixels so you're gonna try to compress it more and more and more very quickly so you pull it together and then you give it a little bit more maybe depth in which with which it can kind of figure out even more features and then later once we are done with our convolutions we're going to flatten it and this is going to get it ready for the dense mural Network so our flattened layer ends up being 576 long good news I don't think I don't think I had to come to compute that let's look at them out yeah the the the model I just keep adding layers and it computes if it figures out what sizes things should be so I didn't have to worry about what the input shape of the dense neural network is I just said hey let's create a dense network of 64 bit activation riilu and then you have a dense network with activation softmax and softmax again is going to be putting things into different buckets in this case there are 10 buckets so you see 10 buckets and each one is going to be assigned the probability just like you saw in here every single one of those let me draw something different set or let me draw something ambiguous like I don't know that's a 5 or a 6 or a 9 and so you could see it's very small I'm sorry but you could see that there are these 10 buckets and the machine learning algorithm is trying to predict of course this is a website that's not my model it's just a demonstration of what's going on basically and let's see how over there we're almost there and you can actually see the accuracy by the end of the first run was already 94% so this really flimsy wall not flimsy I mean it's a tiny bit of code is really really powerful in less than five minutes on not even a GPU we're able to train a hand recognition software that would blow anybody's buying like 20 years ago or even maybe 10 well was done come on which last step hooray whoo now III didn't put anything in this example other than the final accuracy I have achieved an accuracy of 99 on the training set but also the I guess the validation accuracies and aidid that's okay so Khan has super powerful as you can see with Karis very easy to build you do need to have a data set ready here Karis gives you these toy data sets to work with so your load data from the UNMISS data set the Emnes data set is member the one and you just pull it from the data sets which come from tears very easy lastly oh just I mean I can show you more things but I'll show you the the Titanic one the one that I struggled with for a while and the biggest problem I had was my data was coming in in two different ways there was categorical data saying this is cabin a cabin B cabin C you know per person which cabin they were assigned to but also the price that they paid for the ticket so price is of course a number a cabin is not so what I was trying to do is to convert that into a way that would be represented well I wanted to represent it as one hot encoding so that this would be categorical data saying this thing or this thing and I just couldn't get them to mesh together that's just a fallen failing on my part there's probably a way to pull them together so in the end they just gave up and I said look if you're in Cabin a it's a number one if you're in cabin B it's a number two and now the model is really seeing numbers but it figures out that there are some relationships so we've got some pre-processing that I did and some of the pre-processing was actually manual in the CSV file just because it couldn't handle it I wanted to get it to work but in the end I had to use something called pandas pandas is a is a tool library I think would be a right word pandas is a library that allows you to work with a lot of data and it is a very natural way to present data to Karis I don't believe it's the only way but I think I mean my mentor just said use this and I said okay great and this is also exactly the transporter led in the book recommends so what I did is I had to take my data which was a CSV file and I had to read it into the data set let's print out the data set a little bit then I try to do some categorical stuff that didn't work and I gave up so we're printing a little bit of data set and then we're going to get a column so the CSV was converted into a format that pandas were understands and pandas has a nice API for attracting with it so if I want the column of data I can just say hey let's get that column save it this is probably not efficient or a good Python code but just bear with me for a moment then I am going to renormalize things like age age is between maybe zero and a hundred and I don't want numbers close to a hundred competing with numbers that are between one and four for category of cabin so I just renormalized it between zero and one you could do it between zero and five it doesn't much matter but typically we compress things down between zero and one so I compress my data and then I do a really dumb thing and just shove it all in into an array and then let's see should I print a memory I probably printed this is terrible code but I have the training data set separate from my validation data set and lastly Terrace this is where the star shines you got Karis we get some sequential we put in dense inputs of input of eight because there are eight parameters that I'm putting in there's the age there's the sex of the person the cabin and so on I have 16 a layer with 16 items and then there's something called dropout which is really really cool and kind of counterintuitive so in the book Francois Fillon describes it and basically there was this finding I believe by Geoffrey Hinton I'm not sure but there's a finding that when you're training neural networks there are sometimes these kind of behaviors that emerge where certain patterns get stuck and they are not good at predicting things so counter intuitively you just randomly before you start training your model you drop out a certain fraction in my case 10% of the neurons before you start training in this step and you can do it in many steps you can drop out as much as 50% in fact then sometimes that's useful and so you drop out some run neurons randomly and you train your data set for one iteration and then what you're going to do is you don't have to do it but carrots will do it for you it'll randomly drop out other elements and it'll train it again and randomly drop out other elements and so maybe the the one word here that selfless cohabitation these these these new eyes might be wired in a way where they kind of reinforce each other but don't actually predict much but if you randomly drop out some of them then you kind of maybe break that up at least that's kind of a theory maybe the good news is it just works and sometimes during a bunch of drop out can help your model okay lastly I have a simple layer of one because I'm going to be predicting between survived or didn't survive there's binary cross entropy which is the one that will say survived or not and basically it'll measure the loss and it'll say how far away are you from the actual prediction if you're too far away I'm gonna really really punish you and say that's really wrong and if you're pretty close I won't give you as much of a negative score so there we go we've got metrics accuracy so we're gonna look at that and then let's run it on lastly to graph some of this stuff it's kind of boilerplate code you use matplotlib of course there are other ways to visualize your data I'm just using that it was introduced in the book easy enough you kind of just label your nearly your axes and say plot of the the one key think here is when you model fit it'll just do the thing what I want to do is store the history of it so I say history equals that and Terrace will give you a history file but you can then parse it through all right let's let's run Titanic and guess how quick it's gonna be that's it's running through six hundred iterations the data set is in some ways puny it's only about eight hundred things and here's what I got I have high loss but then after a while it goes down and that's good news that's what you want but of course the real measure is accuracy you want to see how well it's performing and tada it's not too bad there's a weirdness going on where the validation accuracies above the regular accuracy that I think that's because my data set isn't randomized I didn't take that CSV and shuffle things around I just kind of blindly took it and round with it so it's possible that the validation set is somehow very very similar to some of the other data and it's therefore predicted better I don't quite know what that correct explanation is but typically your validation accuracy is going to be a bit lower we're around the same but one thing is we have a trend of improvement and maybe if I run this for a thousand B box I will get better accuracy I've played around with this a so far I'm getting to 80% my mentor when I gave him this problem he was able to get with basically my setup 86% and so now I just have to beat him but let's take a look at some of the outputs so much training Oh and I'll just clear it out when it's yeah there I just wanted to show you roughly so this is what the data looks like we have ID which is ignored that's not data that's just for me to identify things then there's a passenger class sex and so on and I've just converted these two numbers a probably a better model would be to convert it to categories so you'll have a vector you want hot encoded so it'll be 1 0 0 0 for person who is class 1 or 0 0 1 for a person who's class 3 and then the model would actually no there is no meaningful numbering between those in my case again I wasn't quite sure how to get it done so we're just through in numbers and hooray I've got accuracy of 80 ok last bit before I answer for open to Q&A when you are dealing with your data and you're making predictions you want to have a baseline you want to have a kind of a grasp understanding of what am I trying to beat what am I trying to beat in this particular example well it's 5050 odds so if I was just randomly picking I would already have an accuracy of 50% and so what you want to do is bake a model that already beats that and depending on the kind of a problem you have you'll have a different baseline so even though I might be getting 80% accuracy maybe that's kind of crappy by comparison to what we could do anyway I would love to open this to QA I I may have lost track of time and talked for a long time I don't even know Wow ok but thank you so much for bearing through this I hope you found it valuable I'll be happy to answer questions and I have more code and later I will post a link to the slides to the this repository to the book in the book repository and feel free to reach out to me afterwards you know over messages and so on I'll be happy to help if I can [Applause]

Info

Channel: Kris Skrinak

Views: 36,947

Rating: undefined out of 5

Keywords: Boris Yakubchik, keras, deel learning, NYC, New York, AWS, Soho, AWS Loft

Id: 5qCDzaOUCWA

Channel Id: undefined

Length: 63min 43sec (3823 seconds)

Published: Sat Jun 23 2018