A hands-on intro to TensorFlow 2.0

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so hey thanks for coming I'm Josh so before I jump in I want to see what we promised in the event invite and then I will give you the agenda so in the event invite we promised no machine learning experience assumed at all so I know we have a mixed group so what we will do is the first the first hour I will assume you know very very little and for people that have a lot of experience yeah oh that's okay and then the second half will go a lot faster and I'll show you some more advanced examples so hopefully you all get something out of it we have a small group so if you have any questions just interrupt me whatever here are the slides from today they're actually from a conference in Montreal about two weeks ago they're like 80 percent the same thing but they have all the important links so you don't need to take notes you can just go to that link there's a PDF you'll get everything you need if you have a laptop today awesome you can try it tensorflow - if you don't you can make a friend next to you and you'll still get something out of it you can try all the code at home so it's bitly ICS c3 alright so basically I only have one or two boring slides that actually introduce tensorflow but here's here's the gold we're gonna get through as much of this as we can so the goal is to install tensorflow 2.0 then we will talk about machine learning basics and we'll do this in a very very low-level way so I'll show you linear regression from scratch and the reason we're gonna do linear regression from scratch is it turns out the ingredients of linear regression are basically the same surprisingly as the ingredients for a deep neural network after that we'll do deep neural networks and we'll do it in two different styles one you may have seen before the beginners style if you've used a library called care us and the other the expert style or the research style you may have seen before if you used a library called chain ER or PI torch and we have both of these in tensor flow - it's great and then the cool advanced stuff will fly through at the end at warp speed here's my only boring slide so what is tensor flow so tensor flow is an open source library it's coming up in four years old it was released by Google in 2015 by far the most important thing about it is the user community and we're going on about 2,000 open source code contributors which is nuts so that would be like 30 times the number of people we have in the room right here from all around the world which is incredible anyway 2.0 is currently in alpha 2.0 is a rework of tensorflow to make it much easier to use tensorflow one was exactly what you would have wanted in 2015 if you were an engineer at Google and your problem was training neural networks at scale it wasn't what you necessarily would have wanted although it was still way better than a lot of things that were available at the time if you were a PhD student and you wanted to dive into deep learning tensorflow 2 is the best of both of these worlds so it's just as fast as one it's just easier to use the beta version of tensorflow 2 is either releasing now or tomorrow morning and I hope the website is still working if not where you'd have to scramble a little bit and the final should be in a few months I'll come back to this stuff later so for people that are entirely new to machine learning I have some bad news and that's that so what is deep learning deep learning is a subset of machine learning it's a very technical specific subset that really means representation learning deep learning is the wrong place to start your machine learning career it is a niche technical area you need to start your machine learning journey by taking a basic machine learning class learning about K nearest neighbors decision trees training data testing data evaluating models deep learning means doing machine learning with neural networks so because of that it's a little bit difficult to introduce quickly but I will do my best you won't understand all this but if you get something out of it cool and you can dive deeper later on your own this is a cool website you can play with right now if you have it you can follow along or you can try it at home later it's called tensorflow playground if you search on google for a tensorflow playground or the URL is playground tensorflow org and this is a neural network running in the browser totally in JavaScript I will show you a much cooler thing done totally in JavaScript in a second by the way tensorflow to you typically write your code in Python and that's what we'll do tonight tensorflow - has a C++ back-end that makes your Python code much faster we also support JavaScript and Swift anyway neural network running in the browser this is going to be a classification problem and the goal is to find a line to separate the orange dots from the blue dots and what we have here this isn't quite a neural network yet this is a mock-up of a single neuron and what we have did the single neuron is there are two inputs here the written is x1 and x2 but what they really mean are the x and y-coordinates on this screen not shown is a grid and you can imagine that this image on the right is divided into maybe a hundred by a hundred grid and what we're really trying to do is classify every little square on that grid and say is it orange or blue and the nice gradient there is based on the confidence score of the classifications and it's it's not actually smooth even though it looks smooth there so it's a grid we have the x and y coordinate of each cell and the goal is to try and find a line that will separate the blue dots from the orange dots and here using a single neuron which is a linear model it's taking a linear combination of the weights and inputs we can find a line you can think about a single neuron is something that's able to draw a line and we're in 2d space we have two inputs so we draw a line like this the problem is if you have a nonlinear data set and I'll show you how that how you draw the line in a minute if you have a nonlinear data set like this there's no way to draw a line to separate the data so if we try and train our little neuron here I can't do it because obviously you can't draw a line to separate the blue dots from the orange dots if you've taken a machine learning class and by the way all the tensor flow examples we're gonna look at in a sec are way easier than this deep learning has a lot of heavy concepts at but the code is really really simple anyway I just want to show you like a very little bit about the concept side so the problem is our little neuron can't separate this data but if you've taken a machine learning class and there's no way to know this if you haven't unless you're really really bright you might know that there's a way to separate data with a linear model and it's a math trick you can do something called feature engineering all you have are the X&Y coordinates on the screen but you can add a new feature Z and you can do this manually and the trick you might come up with is if you look at the data you realize that the red dots are always farther from the origin than the yellow dots so that means if you create a new feature Z and the value of Z is x squared plus y squared because we're squaring the data it's always positive and the magnitude of Z is always larger for red than yellow and that transforms the space to look like this and now we can't draw a line but in three-dimensional space we can draw a plane to separate the data with a linear model and we can do that in tensorflow playground by adding new features x squared + y squared which is close enough but now if we train our linear model it can separate the data it's it's finding that plane and that's really cool the challenge is the problem is to do this we had to think and we had to do math and we had to do feature engineering and feature engineering coming up with smart features to separate your data works great when you have a toy dataset like this but it doesn't work well when you have real data like images video text sound so what we need is a way for feature engineering to happen automatically and that's what a neural network does so we'll keep our complicated data set but we'll still just use our simple input features but now what we'll do is add a single hidden layer now we have a neural network this is a neuron that you can think of as a little logistic regression unit here's another one another one another one the idea here is that our first neurons are looking at the x and y coordinates but our second layer of neurons is looking at features computed by the first layer of neurons so you can think of they're looking at some input times a value or they're looking at combinations of inputs and what happens is is when we train this model even though we just have the x and y coordinates as input now with these simple features we automatically found a new feature that let us draw a plane to separate the data so neural networks can automatically find these features for you and this is extremely valuable because when we're working with real data sets like images the only features we have are the pics and we need something that can learn hierarchy of features so that's like the deep learning the hard way you can actually see by the way if you take a deep learning class this is a this is a visualization that I made earlier and what we're looking at is this is a data set of moons and we're actually watching how the neural network is twisting and bending the space as it's coming up with features to pull the moons apart so it can draw a line to separate them and it's it's really really cool but anyway the programming side of these neural networks is way simpler so we'll do one hard exercise which is linear regression from the ground up and then we will start actually writing neural networks this is gonna take like 20 lines of code the neural networks will take five but let's just do it the slow way first so you can see every piece so if you bring this up so here's the link you can go to I mean I'm gonna walk through this and you can follow along if you want so--but li TF - workshop 1 TF 1 alright so let me show you what's going on here nice alright that was easy this is like the easiest university Wi-Fi to ever sign into I really appreciated so just so you know has anyone used collaboratory before ok so 30% for people that haven't this is a huge take away you're gonna love it totally independent of tensorflow collaboratory is like the best thing since the microwave and what it is I've used Jupiter notebooks before ok some people haven't so let me introduce what a jupiter notebook is a jupiter notebook is a python environment it's open source it's free there's two ways to run Python well there's several one is you can run it from the command line you know Python or you can use your favorite IDE or whatever one such sort of ide is called Jupiter and Jupiter runs in a web browser and it connects to your Python back-end running on your on your machine and the reason we like Jupiter notebooks is there's two types of cells the first cell is called a markdown cell or I'm not sure they call it it's a textile or a markdown cell all cells can be edited and run the way you edit a markdown cell is you just write some markdown and to run it you can hit shift enter press play and it renders it the reason we like this is it means you can put graphs and stuff in line with your code or text so it's awesome it's like self-documenting code the other type of cell is you might guess is a code cell and here you just write regular Python shift enter it will run it so Jupiter which is what you're looking at is a front end and it's connected to a Python back-end what collab is it's a free Jupiter notebook that runs in the cloud there's no way to pay for it even if you wanted to it's from Google research it's good for education and demos what's nice about collab is it has a free GPU which is why we want it for deep learning so in collab here's how you use it the reason I'm saying it's free is there's no cell here like please don't mind bitcoin on it but because it's pretty also the GPU is limited to 12 ish hours a day so this is good for training medium sized models if you're a grad student you can get a ton of use out of this and the GPUs are very very fast and video they're super fast anyway to use collab just you know what we're looking at the link we have is actually on github so what collab is doing here that short link that i gave you if you delete the collab dot researched at google.com and you put github comm instead this is just a jupiter notebook on github and the URL i gave you just opened it directly in collab just for a convenience you can always get out of collab if you don't like it or you're tired of it you can just do file download as jupiter notebook and you're back on back on totally open source anyway in collab the first thing you can do is hit connect and what you're doing is you're connecting to a Python back-end it's a virtual machine running on google cloud it's transient it might get deleted so don't put important data in here it's your own VM so you have root access to it and in all of notebooks the only boring thing we have to do is install tensorflow - which is an alpha tensorflow one is installed here by default along with other deep learning frameworks so you're gonna have to run this a lot it takes like two minutes but i'd want to teach you the new way of doing things you don't waste your time learning the old way of doing things so we'll run that you install tensorflow - by the way any command that starts with a bang or an exclamation that's a shell command so you new bang LS bang are MRF star if you want other really amazing things about collab while it's installing strangely hidden inside the table of contents method there's something really useful so if you go view table of contents you find this which no one cares about but if you go to code snippets everyone cares about this so let's say you wanted to download something from collab to your local machine you would type download and here's basically code you can copy and paste into your notebook and run that will do the thing you want which is awesome and you can use this if you want to connect collab a Google Drive if you want to install your other favorite deep learning library if you want to learn how to put graphics inside your notebooks that's all there so this is like super super helpful collab also recently added a file browser you need it if you want to upload and download files easily you can do it through the file browser so you totally don't need this for tensorflow anything you do a collab will run locally too this is just super super awesome the way you enable a GPU is edit notebook settings strangely and then you can select a hardware accelerator don't use TP use right now they require code changes but they won't in the near future but GPUs will accelerate most tensorflow operations out of the box all right we have installed tensorflow - i'll go super slow for now and then we'll go faster the first thing we're doing is tensorflow you can think of it as a Python library we're importing it and it plays nicely with numpy so the first thing to do in linear regression we're gonna do y equals MX plus B and we're going create some random data and then we're gonna try and find the best fit line so this I will move through quickly but what we're doing Dannan what we're doing ooh this notebook already has output which I don't want now so you see I have some graphics from the last time I ran it so I uploaded those to get up so I'm doing edit and clear all output one more thing with collab if you get it into a weird state you need to reset it you install the wrong library you can do runtime reset and that will delete your VM and give you a fresh one so that's how your story alright anyway we're importing tensorflow we're making a data set and I'll explain what this is so here's our data set and I've created some random points that are like roughly linear and this example isn't really good for teaching machine learning but I'll explain what's missing in other word for machine learning is prediction so given some data how can we use that to make predictions about future data and basically you'll always need two things you need a training set and a testing set and the training set is the data that you're given and the testing set is the future data that you never get to see that you need to make solid predictions on so you can imagine this like you're a doctor and the training data is what you see in medical school and the testing data is the first patient you have to diagnose when you get out of school so it's very important that you get it right and most of machine learning people think about how many layers should I use my neural network but it's really thinking through these scenarios properly like 90% of machine learning is about building a clean data set and designing a really thoughtful experiment so you know that your results are useful what you see is deep learning when you google it online as you learn about it like oh hey man like here's a five minute YouTube video and I'm gonna put like 19 convolutional layers and it's gonna be awesome we're gonna train it's 99% accurate like that's a joke and it you can you can how many layers you should use in your model is rarely the hard problem you actually have we'll get to that here we only have training data because this is just a silly example so what we're doing is we're plotting our training data this is all matplotlib so far no tensorflow now we get to tensorflow and the first thing you need in machine learning is a model and that can be a neural network or it can be y equals MX plus B here we're doing this from scratch so we're making two variables for our slope M and our intercept B and a variable just like a weight and neural network is something that you can wiggle around the idea is we start with the random guess and we will wiggle them until we find a good solution and when you learn about gradient descent and back propagation it can sound complicated but the thing to remember is with any of these things they're just numbers you can literally make them a little bit bigger or a little bit smaller and you just have to figure out which way to wiggle them and that's that's all these fancy algorithms ultimately do so we have two tensor flow variables and if you're you can think of tensor flow a little bit like numpy tensor flow to works in the same way so you can do like NP Matt mole you can do TF da Matt mole and the syntax is similar all right the first thing we need to do is a way to make a prediction so given an X how do we predict the Y for that X and here this is our model so our prediction is we just compute y equals MX plus B and tensorflow - this just works the way you expect even though M and B are variables it just works in regular Python which is great and this operation just you know will work with a single number or it will work with a list of numbers so all these operations are vectorized which means they work on lists of numbers the reason we have vectorized operations is it turns out that most of the math involved in neural networks is multiplying matrices we like to do that on GPUs because they do it fast and the other thing GPUs do is they can multiply many matrices in parallel they have lots of matrix multipliers so most deep learning libraries are written to work on batches or lists of data by default which can be a little weird the first you see it but we just worked with lists of stuff and a trick you see if you have just one example often you have to stick it inside an empty list so you have a list with just one thing inside it that's why that happens if you see it the next thing we need after we make predictions is we need a loss function and loss is a fancy word for error so how bad were our predictions basically we need a single number that quantifies how bad of a job we did in linear regression we're gonna use the squared error so what we're gonna do is we're gonna look at all the Y's we predicted the Y's we wanted measure the distance between them sum it up squared squared sum it up that's our air lower numbers R of air are better if we have perfect predictions which we won't have our data set isn't perfectly linear we'll have an error of 0 anyway here's how we do that tensorflow we're squaring something we'll predict the thing we should have predicted and then taking the average of that so that's our loss function and this notebook is just showing you the initial loss when we start which is some number that we want to reduce so far we haven't done any like training yet but we'll do that now and the last ingredient we need is a way to reduce that loss and we're gonna do that via gradient descent which is the reason I'm showing you this this is exactly the same code we'll use to train a neural network copy and pasted anyway Cheers let me show you where we're going so the notebook contains code to produce this diagram and what we're doing is we're looking at loss as a function of M and B and here you can see this was our initial our initial guess for M and B and that was a random guess whenever we ran this code and what we're doing is you can see that if we can find a way to take steps down to the bottom of the bowl we'll have minimized our loss which means we'll have found the best fit line and the question is so things to notice first of all this bowl has a minimum neural networks will not and one of the reasons why deep learning was thought or neural networks in general were thought not to work well is because they didn't have a local minimum so we can't find a perfect solution when we're trying to optimize a neural network turns out it doesn't matter we don't need a perfect solution but this was only obvious in retrospect we a good solution but here we can find a perfect solution so the question is how do we take a step down the bowl and if you've taken a calculus class you know you can use the gradient the gradient is a vector that points in the direction of maximum ascent and the negative gradient points in the direction of maximum descent and you can compute that analytically if you don't know calculus which I'll assume you don't here here's a simple procedure that will let us optimize our linear regression model we take B this is our first step we increase B by a tiny amount we compute the loss we then decrease B by a tiny amount we compute the loss we figure out which way reduce the loss and we wiggle be a little bit in that direction we do the same thing for M and we do that again and again and again and this will be very slow but it will work the result will be the same and we'll slowly be able to take steps down the hill one thing you might start realizing is if you wanted to do research here you would say that I know that my initial guess for M and B is gonna be really bad probably because it's a random guess so maybe I want to take a big step at first and then after I've taken maybe hundred steps I assume that my guess is pretty good maybe I'll take a smaller step cuz I just want to refine it so you might invent something called an adaptive learning rate the learning rate refers to how large of a step you take and here we're taking big ones maybe and here we're taking smaller ones and there's fifty million different ways to do this gradient descent and a lot of them are built-in anyway so here's how we do this in tensor flow and this is the first time we do something actually useful gradient descent is an iterative process so we're gonna do this for some number of steps I pulled out of hat this is gonna be our step size that I pulled it out of a hat you're starting to see that we have all these magic numbers you learn from experience which numbers are useful and which are not alright here's what we do I'll come back to this gradient tape in a sec the first thing we do is we make our predictions so given our training data we predict the Y values for all the X's we've seen how to do that the next thing we do is we get our loss and that's our squared error so now we have this number and now we finally use tensorflow to do something useful we say tensorflow please give me the gradient of the loss with respect to M and B and if you print that out which you should you'll see too members and those two numbers together will be a vector that points up the hill and what tensorflow is doing it's doing automatic differentiation so it's doing the calculus for you to get that grade and what that literally tells you is if you wiggle these numbers in an incontestable amount in that direction your loss will decrease and all deep learning libraries do exactly this so this is huge and here we're taking the gradient of something very simple but the code works identically for something very complex which is a neural network then what we're literally doing is we want to take a step downhill so I'm taking the gradient and I'm just subtracting it from M and B and at the next iteration our loss will be a little bit lower and if we run this and I brand the code above it which I may not have yeah you'll see we have this iterative process where our loss goes down which is great we can print out the final values for M and B and if you compare those to our initial the distribution we use to make the data they're very similar and then we can plot the best fit line and so that's that's linear regression this slow way but that's cool so the last example were going to use this deep dream and the code is actually simpler than this but it's this is I want to share this example with you because if you're new to gradient descent it's useful I need to see things I need to see every piece like learning gradient sent with neural network sucks because you have like 10,000 weights and a layer and you print it out and who knows what you're looking at but here you can actually visualize it so it's really cool it's like a good learning tool another good exercise you can do at home which is not written here if you wanted to understand gradient descent you could pretend you didn't have this code and what you could do is you could try and write a manual process to figure out the gradient for M and B and that would be a really good exercise and you'll get the same results if it's right okay we'll just be very slow because we don't have time there's different ways to calculate the gradient it does it the fast way which is reverse mode automatic differentiation which sounds fancier than it is all right so let me show you some actual tensorflow code so I will basically talk until 7:15 then I'll give you 15 minutes to work or take break and then we'll dive into cooler stuff all right so actual tensorflow code let me just show you exactly how you can try this home so if you go to tensorflow org which is slow today maybe they're updating it for the beta is that me or is that is that us that's us all right so if you go to tensor floated org you should know that almost everything here is a trap and you should ignore it because most of the website is working with tensor flow one I want to point you to the resources for tensor flow to all of the resources for tensor flow to are right now on tensor flow debt org slash alpha which is where you should go that may very soon change to tensor floated org slash beta like tonight and this is a disorganized resources collection of tutorials that we're building right now for tensor flow too and what I want to do is walk you very slowly through the most basic tutorial that does deep learning to classify images so if you want try this at home obviously there are advanced tutorials here in the advanced tutorials if you want to poke around those now if you already happen to know this but what I'm gonna do is go into ML basics and click on classify images alright all of the tutorials for tensor floated org are jupiter notebooks on github but just for convenience I'm just gonna run this in collab the way we make the website is we just convert the Jupiter notebooks to HTML and stick them on the website alright so here's where we're going with this let me start yield boring install process well that's going let me introduce the data set so for people who've taken a machine learning class you've definitely seen em missed em missed is a data set from Yan laocoon in the 90s it was 60,000 pictures of handwritten digits that you might see in like a zip code on a letter like zero one two through nine and there's 60,000 training digits 10,000 testing digits we often have large datasets and deep learning eminence became too easy you would get like 99 plus percent accuracy on it today in like two seconds so there's a drop-in replacement for em that's called fashion amnesty which is still very easy but a little bit more interesting here we have 60,000 image images of clothing each image is 28 by 28 pixels so it's it's a little matrix of pixels we've got a bunch of them there's ten different classes like their shirts dresses trousers who knows what and the goal is to train an image classifier so given a picture predict which class it is and I will walk you through this and show you how easy it is to write a neural network to do exactly this it's like surprisingly it's bizarre this tutorial has lots of detail that you can read through what we're doing here which is not realistic is this data set is built in a lot of your own time is going to be spent collecting a data set this is common it's built in we can import it this data set is divided into training tests and in both training tests you have two things you have your training images and you have the label and the label is what you should predict when you see that images when you're doing your predictions on real data you only get images you never actually see the right result you just have to make a prediction and deploy it here we have the right result because this is the game and we're just going to evaluate how well we did but when you actually deploy a model this is what a lot of people don't realize so here's a common mistake you make when you're learning about ml because you always have the training and testing data what you do is let's say you develop a model in the training data and you try it on the testing data and it's 90% accurate everyone wants higher accuracy so you play with your model a little bit you try to get on the testing data now it's 92 when you play again 95 98 99 and it's very easy to get a model that works super well in the test data right and so you feel really good about yourself and then you deploy your model into production and it sucks and it's just horrible and that's because you've done something called overfitting meaning you've built a model that works extremely well in the data you have but doesn't work well on data it hasn't seen before and that's that's the goal it's like a game that we play and it's hard so often and the terminology here is weird what we do is we need a way to simulate this process without actually cheating we never actually want to look this so you'll often see a third dataset what you do is you take your training images in your training labels and you randomly pull some of them out and you call that a validation data set or an evaluation data set and as you're developing your model you can use that validation data set to see how well you're doing that may or may not be shown here but that's just FYI if you see it we're an UMP I'll and you can read this the class names aren't included with the data set so we're just writing like human readable class names just so we can see what we're predicting a whole lot of deep learning is playing with the shapes of data and it's a good skill to have you can do this with by the way the name tensor flow a tensor in computer science land is an N dimensional array a numbers a tensor list is a tensor and matrix is a tensor a cube is a tensor if you're like an electrical engineering student I know there's way more to tensors than that but in machine learning land that's all we mean it's an N dimensional array so that's tensor flow is a data flow graph behind-the-scenes tensor flow is a C++ data flow graph engine and the reason we use data flow graphs is most computation can be represented as a graph anyway that's where the name comes from basically here this notebook is walking you through poking around with the data and whenever I get a new data set it doesn't matter if it's simple or hard I spend a lot of time doing this it's really important to understand exactly what you're looking at get a feel for the data and the reason is when your network doesn't work well you need to have a hypothesis why so you're not you're never wasting your time by going really slow we do things like plot it you might want to pick random images and plot it this can be less obvious than it seems and I can be an example that later here you'll see this a lot what we're doing is the pixels as we've imported them happen to range between 0 and 255 and what we're doing here is we're normalizing them to be between 0 & 1 and we're doing that just for consistency and for different reasons neural networks work better when the input data is all in a small range and basically that's because we're going to be multiplying each of these numbers under the scenes now by a weight and if the number is very large and we multiply it by a large weight we can get things like overflow so small numbers are better what we're doing here this is just more plotting code this is more investigating the data there's code here the less you plot some images from the data set none of this is tensorflow and here's when we actually do some deep learning and let me show you why I was making those jokes about the YouTube videos so let's say you've taken this is gonna be our model so instead of y equals MX plus B we're gonna do a deep neural network I'll explain the first part that's not machine learning and then I'll show you how to do a deep neural network there are many different types of layers in neural networks the simplest type of layer is called a dense layer and that's what you saw on tensorflow playground a dense layer is fully connected what that means is that every neuron in the dense layer here is connected to every neuron and the layer before it which in this case is the input data so it's fully connected so you've N squared weights the way you write a layer like this and tensorflow is you do layer dot dense there's other layers to that i'll try and get into called convolutional and recurrent and whatever but dense layers are the most basic because dense layers dense layers take a list of numbers as input our images are 2d because they're pictures what we need to do is unroll the images so we're just going to take all the pixels we're gonna unroll by take unstack it row by row line them up into a long list so this is just a layer it's called a layer but there's no machine learning here it just unrolls the image so now we have a single list of numbers if we wrote our model like this I need a diagram for this I don't have it on these slides what this happens to be this is a list of 784 numbers because that's 28 times 28 each of those numbers is connected to one of these output neurons there's ten output neurons there's ten because there's ten classes we want to predict there are 7840 weights in this model because there's this number of inputs times this number of outputs so in linear regression we had two weights they're two parameters now we have 7000 we're gonna find good values for all those 7000 in exactly the same way if we wrote this code this will work as is this would be a model called multi-class logistic regression basically we're training 10 linear classifiers and you would learn about this in a machine learning class if we wanted to make this a neural network we would add this layer now we have a neural network if we wanted to have a deep neural network we would add this layer now we have a deep neural network if we wanted a deeper neural network now we have an extremely deep neural network and you've just watched your 5 minute YouTube video and hey man here's how you do machine learning bro it's gonna be super cool so now we have a deep now we have a deep neural network deep neural networks have two things they have depth and here the depth is the number of layers and they have width and the width is this parameter here that's how many neurons per layer and roughly you can think of this as a pattern recognizer and there's no one has intuition for this at all the depth of the network is the number of combinations of patterns it can learn and we can actually see this directly I'll try and get to it the basic idea is that the first layer is looking at pixels if you have features of pixels combinations of pixels will give you edges or lines maybe this guy recognizes lines this means roughly there's 128 lines that it can detect combinations of lines give you edges combinations of edges give you shapes textures and if you have a very deep neural network layer 20 might end up learning how to recognize eyes and we can actually show you this it's really really cool the representation learning a deep learning means that these layers automatically learn to recognize useful features in the image this network would be way too big why don't we use enormous networks they take a long time to Train and they overfit the data a deep neural network will just memorize your data set but it will do badly on the test data so here we'll just I'll just run this a couple times you can see it will start with our linear classifier and we can define the model by the way in tensorflow 2 there's different styles of defining your neural networks the most common style is sequential in 90% of networks for beginner to advanced fit into exactly this a sequential model is literally a stack of layers you can also have a graph of layers but a stack works for almost everything in the beginner style we're gonna compile this model and this is easier than the linear regression I mentioned that there's a box of gradient descent algorithms if we did SGD that's the most basic one and that's exactly what we did in linear regression there's a paper Adam and Adam proposed a gradient descent optimizer with adaptive learning rates and per parameter adjustments and all those fancy stuff but we can just because it's so common we'll just use Adam obviously if you were using this for medicine or something really important or just control a rocket or if you really wanted to debug your model and make sure that it worked well you need to understand gradually learn more details about these but we're just gonna black box Adam and it's great this is looks scarier than it needs to our loss function for linear regression was squared error here our loss function here is called let's just say cross entropy cross entropy is a loss function used in classification problems and I don't have a slide for this handy but the basic idea is we have an article of clothing and it could be one of ten things and we're gonna predict a probability distribution that says how confident we are that is each of those things so it's ten numbers each numbers between zero and one and they sum up to one so let's say we're really confident to boot we'll put 95% of our confidence on boot and we'll distribute the 5% randomly through the rest of the digits we need to quantify how bad of a job we did so let's say it's actually a boot so the label is gonna be it's gonna have a 1 on the boot and a 0 and everything else and the labels 100 percent confidence at boot because that's the right answer cross-entropy lines up these distributions and it compares them and the farther apart these distributions are the higher your air so a cross entropy is a loss function for classification and the rest of the stuff we're seeing is implementation details which looks scary but there's a very limited number of options and you learn them as you go in linear regression we did the gradient descent steps manually but here we're gonna use a built in method and what we're saying here is tensorflow please train my model it's called fit it soared from statistics please train my model using the training images and the training labels and a really key question you have to come up in learning is how long do you train it for which is epochs an epoch is a sweep over the data set using every training example once the longer you train your model the more accurate it will be so we train this for a bazillion netbox here's how you get a hundred percent accuracy on the training set you make a very deep neural network with very wide layers and you train it for a large number of epochs and the accuracy should eventually go to 100 the reason we don't do this is we don't care how accurate it is on the training data what we need is a model that's accurate on the testing data and so here we'll just train it for some netbox as we train it I'm not sure if I'm able to GPU here or not but you can see the starting accuracy is low and over time it should hopefully increase and it's not going to be super high because my model is very simple we're using a logistic regression model when it's done we have our trained model and here's how to there's a couple things you would commonly do when you start moving a little faster you might want to evaluate our accuracy here we're cheating we don't have a validation set never do this but who cares here so we're just evaluating it on the test data and we've built a model with 70% say whatever 70 percent accurate that's actually not so bad because there's 10 different things so our baseline would be 10% if we were making random guesses that's not terrible and now let me show you how we can like do the modeling process when we make a more accurate model actually I'll let you do that so I will stop talking for five minutes if you have a laptop and a very simple exercise would be increase the accuracy of this notebook and here are the two parameters you get to adjust here we trained a very simple model for five at box so one thing you can do is increase the number of epochs you make that 10 might be more accurate another thing you can do is convert this to a neural network I mean I deleted the line but you could get the line back so you have a hidden layer and then you could actually make it a deep neural network so here's God here's a neural network there's a DNN and other parameters you can adjust these don't have to be multiples of two we just do it because we like multiples of two those will affect the accuracy too and what you're doing right here this is called modeling and this is the smallest part of the entire machine learning picture but it's what gets a lot of attention because for whatever reason this is is cool so I'll start again in five minutes and does anyone have any questions I know I've been talking a lot thanks for your thanks listening yeah I'll come back to drop out the question was how effective is dropped out to avoiding overfitting we probably won't cover that today very and the other nice thing about drop out is very simple because your other option is let's put regularization on the layers and then it's like how should you regular eyes it then there's math and drop that works well so it's great yes I'll just go around the room awesome so the comment was it's working on iPad yes I didn't develop colab at all but the team is phenomenally good and you can literally run it on your phone so if you want to work all the time now you can the important thing about collab for students is it means nobody needs to buy a GPU because it has one built in here we don't need it but you do need it for the fancier examples yeah oh really great question is there a mathematical proof that if you increase the the depth or the capacity of the network that the accuracy will always be higher I don't know I'm not a theoretician we would have to sit down on archive and start reading intuitively what we're doing is we're increasing the capacity of the network by adding more parameters we're giving it more degrees of freedom to fit the data another way intuitively is at certain point you might get diminishing returns it may be that after you have 12 layers one of them is able to very accurately model what a boot looks like and more layers just mean that it takes longer to train a lot of deep learning is empirical thank you very much which is a blessing and a curse I'll start again in like three minutes by the way is this too fast too slow you don't just say it's like too fast oh thank you awesome couple there's too slow I will go maybe just right I will go a little bit in the second half and the goal is not to understand everything but it's basically to get some of the key ideas and then you get links that you can try at home and every cool idea I'm going to show you which seems kind of researching and magical has a no BS here's a link to the complete code that does exactly what I show you is there a concept of inference yeah so a lot of the words in this by the way I realized two statisticians a lot of what I'm saying is kind of hand-wavy in the same way that when I talked about tensors that seems hand-wavy to electrical engineer what machine learning and deep learning have done we've taken concepts that have existed for a long time like linear regression logistic regression and we basically rebranded them but like no no that's a that's the machine learning thing so we can get paid more money to do something that everyone else knows how to do inference we're doing all the time here the word we're using for inference is predict and you can also think if you took the output of one of those intermediate logistic regression units you could you can treat each one is I'm not sure if you would technically call it inference but it is it's part of the forward pass of the model I don't think it would correctly be called inference because we're not actually using it to classify the data what we're really using it is to produce a feature that's useful to subsequent layers and what those neurons all ultimately train to do is just that and we can visualize those features if we go a lot faster which we will know there's no concept of p-values here at all I didn't talk about it the output layer of the neural network to get into the p-values has a soft max activation and didn't time go into this but soft max squashes the output into a probability distribution it's not a true probability distribution it's a confidence score but we call it a probability distribution because it it ranges between 0 and 1 and sums to 1 but I wouldn't go to Vegas and bet proportionate or in exact probabilities corresponding to the output of that I would treat it as a very rough confidence score yeah yeah you'd have to check papers off the top my head I'm not sure what the target accuracy is for fashion amnesty I want to say like mid-90s do you ever take with a dense network may be higher with convolution I'm not sure it's not great there might be a way to get 99 but I haven't I haven't played in this example too much just wanted to ask about the width of the nodes that you'd put in 128 256 any sort of a rule of thumb practical are there great question are the rules of thumb not really and the rules of thumb are if you're doing this in practice you from experience so I happen to know that this works reasonably well there are lots of standard architectures by an architecture that's what we're looking at right here there are standard designs or architectures for models that we know work well for different problems and we reuse them all the time so if I was working with actual pictures I might use an architecture called inception and inception is built into tensorflow it's also a paper it's just a very deep neural network in a particular arrangement I think you search for like Inception whoops not the movie v3 here's here's a picture of what the network looks like and basically there's different types of layers in here and it's complicated diagram but basically TLDR you're looking at a very deep network with some fancy features so our little eminence network had I forget four or five layers and here we might have 20-ish and there's other things too but we can import this and reuse it out of the box so we don't have to rewrite it from scratch every time yeah are there common resources for those precooked architectures and oh yeah oh great question are there common resources for the precooked architectures yes definitely so I will show you that when we get to the deep dream notebook which I might just do next just for fun oh yeah bitly /i CSE III hold on let me just get my place that guy all right I'm gonna keep I'm gonna talk for two minutes then we'll take a 10-minute break and then we will keep going so basically just to move a little faster what we've seen is this this style there's different ways to define your networks here we've used the beginner style which isn't really beginners this works for research too we're bad at naming things but this is exactly what you just saw this is a complete neural networking tensorflow - it's called beginners because we're using the sequential API so we define our model as a stack of layers great there's another style we can use - and it's this so this is new in tensorflow - it existed before and Chater and i torch and I believe gluon and what we're looking at here is this is object-oriented numpy programming what we're doing is and all all those libraries that mentioned will provide a class and they call it something differently here tensorflow is called model and chain or I think it's called Network basically the library gives you a class the way you implement your neural network is by extending the class so here we're extending the class we made my model in the constructor if you're from Java this would be great you define your layers in the constructor and then the forward pass of this model or the predict method here is called call and you write the fordpass imperative lis so here we're taking some input data that could be an image we're passing it through the dense layer we're getting a result we're passing that through another dense layer and we're returning the result and this is perfect if you want to do actually if you're a student this is great too because maybe you're curious you want to know exactly what the inputs look like because this is regular Python as you would expect to use do print inputs if you want to see exactly the output of a dense layer you just print X here if you wanted to write your own layer it's trivial it's just Python so this is great the reason that there's a huge con to this - so the pros are this is like super hackable there's a huge kind which is not obvious and this is a software engineering thing so but I'm debugging a student's code if she writes it like this there's a correct way to use this style of code it's a stack of layers because this is defined in a consistent way what you might not realize is when you call model dot compile tensorflow can help you debug we're catching errors at compile time or we start pushing data through the model which is great this means that the bugs that are gonna exist in your model are gonna be conceptual bugs meaning you know your layer was too small or you didn't use the right combinations of layers but they're not programming bugs which is huge in this style here what your what your model is under the hood there's a data structure and it's a stack we understand stacks very well in this style your model is a piece of Python bytecode which means you can do whatever you want which is great there's also nothing we can do to help you out so the only you get in this style of code you get programming errors at runtime and you get machine learning errors at runtime and there's also fewer standards for this so maintainability is a huge issue - right just like in software you write software runs and it's run and edited like ten times same thing with machine learning models and so if like you know your friend Mike at your company writes his model and it's funky style like you have to learn Mike's style to debug his code which is horrible but anyway it's it's great because it's fun to use but there's a cost anyway so I'll stop talking there and I'll start again at a seven thirty eight minutes by the way both of these models can be trained with model dot fits or gradient tapes that's all right whoops got a lot of coffee Oh all right we start again in just 30 seconds all right so now we can do some more advanced and fun stuff so I mentioned that deep learning runs in JavaScript too and this is a program you can try right now if you want it's bitly slash pose - net and I'll just show you so this is deep learning running in JavaScript it's written in tensorflow but it's using something called tensorflow j/s and it's running entirely client-side so no date is no date is being sent to the server don't worry I just probably pop the plug come back plug agram use one sec yeah that's gonna be home oh you could have met thank you you rock Thanks yeah all right I'm gonna try this one more time and hopefully won't break it carefully carefully carefully all right let's see yeah it's cool right so this is pose net it's running entirely locally none of this data is being saved or sent anywhere so this is this is great and so it's not meant for this many people but what's really cool about running machine learning in JavaScript which was not obvious to me right so I'm a Python developer so I'm like when I heard about JavaScript I'm like that seems silly right cuz we already know that pythons a little bit slow which is why we have this C++ back-end and Java scripts even slower right so I was like that's silly why are we gonna do this I'll fix it again but the answer is obvious if you're a JavaScript developer and it should be obvious now so if you write it in JavaScript it can run in a browser right as it happens browsers also run on phones so now you also have a mobile app and it can be fast because the computations not fast well there now it's actually pretty quick but you don't have to upload images to a server every time which also means it can be private and so machine learning your JavaScript is huge and the reason I'm sharing you this is if your JavaScript developer tensorflow jess is really good and you actually never need to go to python what you can also do with tensorflow jess is convert a model that you've trained in Python and run it in the browser which is what happened here so there's all sorts of possibilities another possibility here is even if you're not a machine-learning developer and you just want to reuse an existing model there's all sorts of stuff you could use this for games or whatever you can think of like to pull a humanitarian application out of a out of a hat it can do eye tracking too so like a no-brainer application is if you have a quadriplegic you can watch his or her eyes and you can use that to track a mouse cursor and presumably there's like a million more cool things we can do that anyway it's like a whole new thing so let's do something more advanced and I will arbitrarily choose deep dream which I have a crappy slide for because I made this at the last second but here's here's code you can use and I'll explain what deep dream is so the code for this is bitly / mini - dream so it's mini dream Oh with the - and let me show you what deep dream is and then we'll talk about what it is and then why it isn't how it works so if I search for a deep dream right we get all sorts of crazy psychedelic images like this let's take the Mona Lisa so deep dream is a software program and now in tensorflow originally was in cafe I think from 2015 and deep dream made pictures like this and so what what do you see when you look at the deep dream of fied Mona Lisa like what stands out dogs yeah there's dogs everywhere they're bits of dogs so I see there's no right answer here but when I look at this I see dogs snouts and maybe eyes snakes yes good eyes there's there's definitely snakes fish for sure peacocks so why are we seeing all these things I hadn't seen these snakes before some gladdy salt acts that's gonna show snakes later oh yeah holy crap anyway um skills yeah so this is super psychedelic now does anyone know why deep does anyone have any one randomly read the deep dream paper so deep dream was an experiment and it happened to produce images like this the goal of deep dream was not to produce psychedelic art this is a happy the goal of deep dream was to investigate so if you train a neural network specifically a convolutional network on a large data set it can classify images like cats and dogs with high accuracy the question is why is the network so accurate at classifying images how does the network work what's it doing and I've been really hand-wavy about like oh we have like this layer that recognizes textures but deep dream is a way to see exactly what the layer has learned to recognize and we can actually visualize it and I'll show you the reason that we're seeing what we're doing in deep dream is we take an image of the Mona Lisa and we choose some layers from a neural network and here I don't know exactly the names or the numbers of layers they've chosen but you choose some combination of layers and then what you do is you modify the image to increasingly excite those layers so the hypothesis would be that if you have a neuron in the layer that detects edges and you forward the image through the network so you classify it you record the output of that layer and then you find a way to modify the image to make that output stronger and if there's a layer that recognizes eyes and you do this then you'll see eyes in the image and let me show you a little bit more concretely how this how this works I should probably go to my link so this is the minimum amount of code you need for a deep dream the paper that produces these really high res pretty images has like a whole bag of tips and tricks which I just skipped but I just want to get you the minimum code that does the thing and what we're gonna do is well this is installing is we're gonna take an image like this and we're gonna produce an image like this using code that looks almost identical to linear regression surprisingly so wall tensorflow to installing let me just I'll go fast here but basically the first like 40 percent of this notebook is me being a bad Python programmer and here I'm just downloading downloading an image from a website normalizing the pixels and loading it into memory and I'd wrote it this way just so you can see exactly like how it's being normalized and what all the steps are so at this point we've just pulled some picture into RAM earlier we talked about pre-trained neural networks and I mentioned inception v3 here we're importing Inception d3 and that's that monster neural network you saw earlier and this is a pre-training network and it's been pre trained from the data set from Stanford called image net and image net has a million different pictures in like a thousand different classes cats dogs snakes flowers peacocks all those trippy things that you saw appear in the Mona Lisa are images that exist in this data set and this is a generic classifier imagenet is really popular because it's a standard data set and it's used for you know just for research to compare different models but if I had a picture of like a cat on my laptop or whatever and I just didn't ception b32 up at a cat that would probably say cat so right now this is just a complete tensor flow model and this is the first ingredient for a deep dream what's interesting though is we don't want to use this model for inference I don't care about predicting cats what I want to do is pass an image through the model and my goal here let me see if I still have that picture of Inception v3 or any picture if I classify my cat what I'm really doing is I'm looking at the cat goes in here goes through all the layers and out of here comes a probably probability distribution inception as its trained on imagenet knows a thousand different classes so this is a list of a thousand numbers that range between zero and one and some to one I don't care about that though what I care about in deep dream is let's say I want to get the output of this lair and specifically maybe some neuron in this lair whoops and when I start this experiment I have no idea what what some neurons in this layer has learned to detect so my goal is to find that out great question and here's how do you choose which layer and the answer is experimentally great question you choose it experimentally and I'll show you the mechanics when the authors wrote this model they conveniently named all the layers and when you write a tensor flow model you can have another parameter like name Josh's favorite layer but here they have different names and I know these what I'm doing here is I'm writing a model to get the output of these layers and I'm choosing these layers because I saw in a different implementation that they worked well you can try other ones and if you want to see the all the layers in the model you can do something like this exception b3 dot summary and that will just print out all the layers I'm writing a new model here and this is a different style of tensor flow which I don't wanna go into details it's called the functional style you could write this using the sub classing style too but what i'm doing here is for each layer layers have I'm asking the model to get that layer and then I want the output and I'm writing a new model and the idea here is I'm passing it some image and instead of getting the output of Inception what I'm doing is I'm getting the outputs of all that of the list of layers that I just specified so this will give me a list of numbers which is the activations coming out of those layers when I classify an image yeah great question so how do you do this with TF eager so four eager is an interesting word eager just means normal imperative Python tensorflow too is eager by default and the reason I didn't have to mention it is because it just is so this is doing it with tensorflow eager here I'm technically I'm using a style called the functional style which is symbolic you could rewrite this using model sub classing if you wanted to ah there are no more sessions to get the outputs of a specific layer you would do exactly this you would just do print X the reason we're reaching into the network great question the reason we're reaching into the network using the functional API is this network was defined using chaos layers and because it was defined I'm pretty sure it was built using the functional API we have these really nice layer names and it's easy to pull out specific layers if it wasn't written in this way what we would have is some class these aren't my speaker notes I stole his slide from somebody the class would read inception b3 or whatever and this would be huge and this would be pretty big too but we'd have the actual bytecode and so we could just modify this or you could write a new function you could do you could do def you know acts and then input image and then you would do something like you know return you know my layer well whatever you just write the board pass in there so basically yeah those are your options it's this is a good thing so tensorflow is a big framework and it has lots of different ways to do things but it also has a large learning curve because of this but into at least they're all really sensible I'll show you a trick with the graph mode the second one I don't know why I wrote it this way but anyway this is 30% of deep dream we have an existing model it's been trained on large data set we're forwarding an image through it and we have code that will give us the activations of some layer great so that's 30% and in the forward pass I mentioned early in the talk how that usually works on lists of images they take a batch of images as input here I just have a single image and whenever you see code like this you'll see this all over the place it can be written in numpy expand dimensions or TF expand dimensions what this means is we're just taking our image and we're sticking inside an empty list and this is just so we can forward it through the model it's an implementation detail because this is just the shape of the data that the model wants and if you want to understand exactly what this does a really good thing to do is just you know well do print like image bat shape and you'll see exactly what you're looking at all right the second ingredient in deep learning is a loss function and here and this is where we get into research our loss function is how excited these layers were by the image so what we do is we have all the activations and we're just gonna sum them up and there's some normalizing code here because I found it in the paper and copied it but what we're really doing is we're just saying for each layer that we picked and that's gonna be the layer called mixed to mix three mix for whatever the activations is a list of numbers and that's the output of all the neurons in that layer and we're just summing it up so our loss is a single number this is how excited or how activated were these layers are by our image and then we have the last piece of deep dream and this is really cool you might recognize this code from linear regression but there's one trick to deep dream which is this almost always in a machine learning model you adjust the parameters of the model you don't change the data so if your fashion Emnes classifier is not very accurate you wouldn't be like aw no problem let me just like draw in the images until it gets them right right but that's what we're gonna do here so in deep dream the model never changes and our models inception we don't adjust it but here's the secret to a deep dream we take our input image and we treat it as a variable and let me show you what you can do once you have that here's our training loop we take the image we forward it through Inception we get the activations of the layers we get our loss which is the sum of those activations and the magic of deep reme is this so we tensorflow please give me the gradients of the loss with respect to every pixel on the image and this will return a list of numbers that tells you for each pixel if you wiggle it up or down a little bit increase or decrease the intensity how that will affect your loss or how how that will increasingly excite the layer and then what we can do this is from the paper I'm just normalizing that we can literally add those gradients directly to the image so now we're modifying the input image and if you repeat this a couple hundred times and you can actually display the image as you go if you want you'll see these like psychedelic phases start to appear in the image and that's because the layers that we've chosen are responding very strongly to those features and this is actually a really big deal even though like this isn't very high resolution the fact that we can do this proves that deep learning is representation learning and what we've gotten for free is inside that Network there's some neuron or some layer in this case that gets excited by things that look like that and if we start a follow-up to this is instead of making a psychedelic image we can start from an image that's random noise and we can try and generate an image that maximally excites a layer starting from scratch so we can ballpark understand what these layers recognize and linked at the bottom here there's a link to paper or an implementation that was done by researcher yeah this is the deep reme author or one of them and this person has directories of different images that really excited different filters in each of the layers and so the first layer is looking at pixels so simple simple colors simple edges and these are images that made different neurons in that layer very excited and they're simple the takeaway here is simple shapes simple colors as we go deeper into the network so now we're looking at this layer is looking at combinations of combinations of combinations and there's neurons that are responding really strongly to these things and some are quite beautiful right this is really cool and the takeaway here is none of these were manually engineered these were all found incidentally there along the way to training an image classifier the network did this as an artifact of the job we gave it to do which is super super cool and as you go deeper in network the patterns get more abstract and here's things that you're starting to see they look deep dreamy right the reason that we're seeing shapes like this is imagenet from Stanford happens to have tons of peacocks for whatever reason so this network responds really strongly to peacocks it also has snakes and if you scroll through this long enough don't do this when you're not sober you'll find some really really interesting things in deep dream anyway you can spend a lot of time on this and then if you go really high level you start to see things that look a little bit like you can almost start recognizing maybe these are ants right right so now we're starting to see maybe this is from a chimp we're starting to see like phases right and so but this is this is called representation learning and deep learning is representation learning right which is a big deal so anyway the TLDR this is the code is in that little notebook there's the paper at the end of it let me show you some other cool things you can do with deep learning that are non obvious great great question if it neurons excited by one image is excited by others and the answer is usually its excited by many different combinations just two different degrees I really want it okay I need to stop presenting for just one sec I'll plug it's right back in I'm gonna go into my drive and I just want to we don't have time to do convolution but I want to pull up some slides and explain exactly what those layers are cuz they're not dense layers they're convolutional layers alright I'm just going to guesstimate for one second yeah right so these are layers that are convolutional layers and you can write a convolutional layer and tensorflow by saying instead of layers densities say layers dot convolution 2d or comp 2d and convolution is not a machine learning concept but I want to under explain briefly what convolution is and why this works so convolution this is from library called SCI PI and the goal of his slide was just to show you that you can do convolution without tensor flow and here's a question by the way so when I wanted to teach convolution I wanted to get a picture so I imported the built in picture from Sify and in that an astronaut I thought that was super cool so I'm poor to the astronaut and does anyone know of who this is she's famous enough to get built into Syfy very good close Sally Ride is a great great guess ah I didn't know that but I do know you missed out this is a lean Collins and so she was the first woman to command the space shuttle Columbia which is really big deal so now she's built in a sci-fi anyway we're doing edge detection on a wing Collins and we're doing that using convolution and convolutional networks so a dense layer I mentioned it was fully connected it has tons of weights it's N squared weights it's the number of input neurons times the number of neurons in the dense layer so they're super inefficient convolutional layers have a very small number of weights this particular layer would have nine weights it's this little we call it a kernel or a filter and this kernel can do edge detection with just nine weights so the way we find all the edges in the image if we convolve that we'll get an output picture that looks like this with the edges in the image and I just want to show you what convolution means what convolution means is we slide the filter across the image we take the dot product of the filter and the pixels under the image and that literally just means multiply negative one by whatever pixel intensities beneath it plus negative one plus all that the reason this gives us an edge is there's nine values in this filter one of which is 8 and the rest are negative 1 so if all the pixels beneath the filter are the same color or the same intensity that filter returns to zero because there's no edge if they have a big difference it returns a large number so this is a very simple edge factor here's how it works basically this is convolution at least in the machine learning sense again if you're an electrical engineer I know there's a lot more to this but this is a machine learning convolution we have an image in a filter and we slide the filter across the image taking the dot product as we go and out comes an output image what's really important is the takeaway is that with small filter can do really sophisticated things and this is an edge detector that I wrote manually if you use Photoshop that uses convolution to do sharp and blur all that stuff and those are really nice filters written manually by engineers who put a lot of thought into it anyway the convolutional layers have these filters too and what we're seeing in deep dream is we're creating images that excited some filter so what what we're seeing deep reme is really there's some layer that has a filter that responds to something like that and that's also one reason why we see lots of repeating patterns so you see how these patterns are repeating again and again and again and that's because these filters are literally sliding across these images but anyway these are convolutional layers and they work very well for image classification all right other cool things here's another like crappy slide that I made at the last second oh god this I don't have much to say about except I just wanted to give you a link let me see if I can make this less crappy so you can actually see it so this is a minimal example of doing image colorization with deep learning and as you might guess it's bitly mini - color this is the world's simplest but also like worst example of image colorization worst in the sense that the training set is one picture and the reason is it just memorizes it this won't generalize it all but this is the minimum amount of code that you need to do the thing and it will also run in like two minutes so you just train you're very fast the goal obviously is you take a black-and-white picture and you color it in and the one thing I wanted to share is not the machine learning chunk but it's actually about how you represent images and normally when we think of images we have three color channels we have red green and blue right so an image it's not a matrix but it's actually a cube and we have the red pixels the red intensity the green intensity the blue intensity but there's a trick to this and what we do is we represent the image in lab space la B so we convert an RGB image to an lav image and there's still three color channels but the first one basically represents the lightness or the grayscale version of the image the next two color channels represent the color so a common question in when people learn about colorization is are we predicting this entire image from this one and the answer is no we're just predicting the color channels so this is a problem called progression and for each pixel regression means predict a number we're predicting the number that corresponds to the color intensity of that pixel in the output image so we literally copy the grayscale image to the output and then we color it in we don't generate an image from scratch and oh bitly / mini - color it's not that that's said oh look let me see if I messed this up in my haste nice that alright well I mean that's great - I guess so let's let's make a new link that's actually really sad so what that tells me is that either bitly just screwed this up somehow or no one's actually tried this at the last talk that I gave so let me just fix this really quick you can see the magic and excitement of creating fitly links all right so it's mini color - and mini color - we'll get you to github but let me show you the trick to get from github into collab mini - color - one way is you can download the notebook and upload it to collab with the UI another thing you can do there's a browser extension you can get research I think it's called the it's probably called the opening Google collab browser extension and it gives you this little button you can click the button and it just opens it up which is really sweet or here's how you change the URL you delete the dot-com and you write collab research google.com slash and then it will just open it up from github which is great - this is all notebooks this is helpful anything on github this is really helpful too because the github built-in viewer is awesome but it doesn't always work for enormous notebooks I don't have time to go into this in depth but I just want to say that a surprising thing about deep learning is this is the complete model and it's using these convolutional layers that I mentioned earlier and that's it and this is all you need to do image colorization and this type of model is trained in exactly the same way as you would see other ones so that's that's the takeaway here I'm gonna keep moving but you can play with that let's talk about games I guess now I'm going like super super fast the takeaway here is not to try to understand these models but it's to have like links so if you're curious you can what we always have is complete code so you can go through it and play with how it works really to understand I'll talk about Gans which are an amazing idea to really understand Gans would take a few months in time but let's just give the basic idea so Gans are called generative adversarial networks and a lot of the examples aside from deep dream that we've talked about are about classification or regression and that's the bread and butter and deep learning the number one most common skill is I have some data I need to make a prediction great or right generating data is much harder so it's relatively easy if I give you a picture you can say this is a cat much harder if I ask you generate me a picture of a cat and it's way harder and that's what generative models address and ganzar a deep learning solution to generating data and it can be of any form it can be an image it can be a sound whatever and here's how it works in deep learning one of the ingredients is a loss function we have to have a number that says how bad of our job how bad of a job our model does so we can adjust the parameters to make it do a better job the question is for a generative model how can you get a number that describes how how fake your image is and the solution that the author of Ganz came up with in 2014 was that we already have it we have neural networks that can classify images so these were invented by a PhD student at the time who's now super famous Ian good fellow and does anyone know what Ian was doing when he came up with the idea for Ganz he was drinking the reason I'm mentioning this is you should take a break from your research sometimes so here's the basic idea there's two networks and we train them simultaneously we have a generator and the generator is responsible for generating images and that's new and then we have a second network and that's a discriminator and the discriminator is just an image classifier and that's something that we already know how to build and it literally if we were doing cats the discriminator would be trained on pictures of cats and the prediction task is real or fake cat what's interesting is the generator never actually sees a picture of a cat ever all the generators cease it starts by generating an image which is going to start life as random noise all it sees is did the discriminator determine if that was a real or fake cat if the discriminator says it was a fake cat we use the gradient trick in the same way we did the gradient with deep dream we get the gradient of the generators we get the gradient of the discriminators loss with respect to the generators variables and that tells us how to adjust the variables in the generator very slightly to fool the discriminators slightly more at the next iteration so the next time the generator generates an image it looks slightly more cat-like but the discriminator still says it's fake but if we train these models in parallel for a very long time eventually they stabilize and we get to the point where the discriminator can no longer say if a generated cat is real or fake and then at that point we have an image generator the we use M missed a lot in deep learning and when you start with this is the original M that's not fashion in this was datasets of handwritten digits this is an example of Ag an on M missed and the reason we you can see here where we're seating it with the same random seed at every step so it generates the same image and the generator is learning over time to produce numbers that look real and at the end of that it's doing a pretty good job we always start life with M NIST which is great because it runs fast it's also like the world's most boring dataset and then what happens is like the next year the next like a PhD student has the next paper using a much cooler data set and I'll show you um I'll show you a couple of like the really seminal awesome datasets there was a question that I totally missed yeah the generator only sees the less note the S note and it basically has access to the discriminators code it sees the weights yeah exactly sees the weights with just a yes-no without the weights we need a way to get the gradient so yeah we can actually how many cycles does it take to go to noise to a cat so let me show you how to answer that question and the reason that all these tutorials they all run end to end by the way and we like this because the expression trust would verify so they actually work and you can find out if they don't but if we have the output they might have saved the output or they might not have the output was not saved but you can run it and we'll find out exactly or I can scan through and we can find the number of epochs that they train this model for but here they happen to train this 450 up box training Ganz you have to read papers they can get into weird spaces but here was 50 a box these datasets can be surprisingly small by the way so in the next paper I'll I'll show you you can train again with like 200 images give or take you don't need as many thousands as you would anyway first scans let me show you how there's two neural networks at warp speed here's a generator and there's more complicated layers here's we've only really briefly talked about dense you can see there's different types of layers right but even though this is like clearly like a research paper you'll notice that it's still implemented using what we call the beginners API which is super powerful so this is a stack of layers and this is one neural network and it takes an image and what produces an image the second neural network I talked about convolution very briefly this is an image classifier and it's using convolutional layers so we talked about filters just you know what some of these numbers are we talked about filters and on the alene collins image I showed you filters that were 3 by 3 these filters are 5 by 5 so they're slightly fancier filters and this layer has 64 of those filters so it's going to learn to detect sixty-four things that you can represent in 5x5 grids of numbers the reason that convolution is really powerful by the way and I stole these slides from a friend of mine Martin corner at work he's great like they're epic so I didn't want to redraw them so this is an RGB image and when we do convolution I showed you in 2d but convolution works in 3d so here we have a filter a three dimensional filter and in the same way that we convolve dover lien we can convert Reda mention olymic what the treaty filter and we get an output image and what's cool is if we have a second filter we get a second output image and the first output image could be all the edges this could be a different type of edge that the filters learn to detect but the insight here is that with four filters our end filters we get n output images so when you see too many tabs when you see this could take some time when you see this this also means there's going to be 64 output images the really interesting thing about convolution is when we get to the next layer so this is a five by five filter but you see here our filter look through this three dimensional space the filters that are looking at this layer would look through a four dimensional space and the filters in this code as written are going to look through a 64 dimensional space so what this means is these filters are 5x5 by 64 so they're extremely powerful and they're not something that we could ever write by hand but they can detect things who knows what you can detect by looking at 64 other images at once so they're extremely powerful but they're also very efficient and that's why this doesn't have to be very deep to be very powerful a common mistake people make too when they start with deep learning is we learned that deeper networks are better so you have like a 20 layer Network and some types of networks like convolutional networks can benefit from being very deep so like a production image classifier might have a hundred layers give or take there's other types of layers that we use for texts and those layers are called rnns and they're called there's one type of layers in LS TM there's no way to know the answer this question but let me just throw it out there anyway so if I told you that for like that toy neural network inception has like 20 or 30 layers how many layers do you think Google Translate has and the layers are called LS TM s and Google Translate is trained on where the world's largest data sets so how big do you think the model is in terms of number of layers 64 16 you're actually very close yeah you're extremely close so surprisingly it's almost nothing so it happens to be two models one's an encoder which are in 2014 it was two models one has seven LST M's and the other one has seven LST M so it's two seven layer deep networks which means you could literally write the thing and like eight or nine lines of code adding the pre-processing so small models can be extremely powerful yeah how is the text stored great question how is the text stored I will show you the next example how to write a mini version of google translate or at least give you the code you can play with it anyway that's that's a simple game there's another more complicated game that we have code for you can try this is from Berkeley called picks two picks and here the Gann actually starts by taking an input images it takes an image as input with the generator we just say generator please generate me an image have a good time here we say pix to pix generator here's an input image generate me the corresponding output image and this is a data set called facades from Prague I always forget okay the chech Technical University problem anyway the input image and fix to fix is it building facade here's the image from the training set and here's what it manages the output which is really beautiful and so this you can run this code and do exactly this and what's cool about pics to pics is it just takes paired images as input so you can do this a common thing you can do is if you have Google Maps you can click satellite you can now have the same region in satellite so you can train pics to fix to go from map to satellite which is really cool so pix depicts is beautiful and it's amazing and there's the complete code again if you want the code ignore things that are not in alpha maybe beta because it's much harder and then the latest model which we don't quite have code for yet but we will in a couple weeks is called cycle Gann also from Berkley and cycle Gann does something nuts so this is this is a really big deal and so the reason this is a big deal is there is no way to collect the data set to transform images from horses to zebras because the data set doesn't exist right not found in nature but cycle can cycle gain can do it and the trick to cycle gain is although there isn't a data set for horse to zebras so it picks depicts is paired right meaning that you have to have input image output image in your training set cycle Gann takes directory of input images directory of output images so here's a set of horses and a set of zebras and it can exploit supervision at the level of sets which but anyway the paper is great it can also do really trippy things like this summer to winter day to night it's nuts you can also convert between different artists and again like there's no way to collect a data set for that you do photograph two artists it's great so cycle Gann is beautiful anyway a couple weeks and we should have it done other amazing continuing like the warp-speed tour of amazing things you can do let's do like warp speed machine translation so hopefully this doesn't go to Sephora but bitly slash minimal dash and MT and that stands for neural machine translation and this is supposed to be like the fastest possible way to train a translation model using something that's not terrible so this code should run in about a minute and the input data to answer your question here's the input data to our machine translation model and the only input data is just a corpus and this is paired so it's a sentence in English and a sentence in Spanish and there's nothing in this code that don't anything about English or Spanish so you can just modify this for English to French or anything else you can represent an ASCII because I just you could do non ASCII characters too but my pre processing code is asking only just to make it fast yeah oh is this a public data set yes because the codes apache2 on my github repo but this is just some random sentences that I stole from I don't know but yes this is public and this won't this won't generalize the goal of this example is just train fast prove to prove the point and we have fancier examples on the website that we'll train in a couple hours that are much better but the insight with and I don't have a diagram for this I'm just gonna tell you one of the approaches that and I'm also not a linguist if anyone's a linguist you'll know a lot more about this than me and it's really interesting one old approach to machine translation was you take your sentence in English you map it to a logical form like some form or code that you come up with that captures the meaning of the sentence then you have your code that gives you the meaning then you write a separate program that takes the meaning and emits text in whatever language you want and this is called a inter lingual representation like that means logic that captures the meaning this is super hard right and it didn't work like it's not understood how we can manually write code to extract the meaning from a sentence but it'd be great if we could and as it turns out the way the deep learning models work is they do exactly that and the way that machine translation works is there's two models there is a model called an encoder which you're seeing here and an encoder takes a sentence in English or whatever's in your training set and it maps it to a vector which is a list of numbers so the output of this is a list of numbers that captures roughly the meaning of the sentence then there's a second neural network called a decoder which takes that list of numbers and then it emits a sentence in the target language and in the same way that when we trained in image classifiers we saw in deep dream as an artifact we got all these amazing filters that can detect different really abstract things as an artifact of training this an encoder decoder model we get an inter lingual representation for free and the reason I'm mentioning this is that like if for if you happen to be a linguist beyond the fact that we can train a machine translation model which is awesome there might be some value in investigating how this thing compresses sentences into a vector because that's also really really cool and now we're doing research which may not be useful but that could be a cool topic to look into but yeah so anyway you can read papers by link to a paper for this infinite fail - okay failed the link to the paper but if you search for unlike the Google Translate paper this is like roughly how that that will work great question is the size of the vector that we can press this to larger or smaller it's almost always smaller but sadly here it's a parameter and one unfortunately deep learning is super experimental so like what's the right size for the embedding in the same way that how big should your layers be you find out experimentally we find out experimentally so deep learning is really hacky right now and it's hacky because it's new and so best practices we're kind of figuring out as we go I would like more awesome things you can do let me do just two more and then I'll stop I'm not going to say anything about this except want you to code another cool thing that you can do with encoder decoder models is you can instead of going text to text you can go images to text you could also go text to images if you wanted and so here we have a picture and the task is to learn how to caption the picture and so here it's an encoder model and the difference is the encoder is a CNN it looks almost identical to an image classifier so the encoder model it's written in a strange way here but the encoder model is basically an image classifier and the generator model is basically an RNN but I'm just showing you this so you know it's a thing image captioning is one of these amazing things you can do and let me show you just the last thing I want to show you now stop little bit early is the only thing in tensorflow - that's not like standard Python E is this and it's just one line so here's some code that we've written in Python I have some layers that I just wrote doesn't matter we're making some data and we're gonna call the layer on the data and I don't care about the output I just want to know how long this takes to run and so if I run this in this whoops non-repeatable benchmark it takes a third of us it takes three one hundreds of a second great so how do we make this faster the reason that we care about speed so tensorflow the C++ back-end and a Python front-end Python is easy to write but it's super slow and it can be about a hundred times so slower than C when multiplying matrices so it's really slow so when you use tensorflow normally it takes your Python code and it actually does the math and C and then returns the result and numpy does exactly the same thing in addition to numpy tensorflow works on GPUs which is great but it can do something else and the problem is in numpy you're doing this round-trip from python didn't up python to seee python to C and your ping ponging back and forth which is slow but we can actually compile this code so the whole thing runs in C and it gives us the result once and this eliminates the round-trip and because we're compiling it in C I'm not a compilers engineer but if you're compiling code and your compilers guy or girl there's all sorts of things you can do to make the code go faster here's how we compile the code and tensorflow so I'll just flip back and forth so you can see it we just add that annotation at the top which is TF function and what this does is it tells the C++ back-end to analyze this code compile it run it and give us the result and if you never write this in your tensorflow life this only works in tensorflow to your code will work identically and it's fine you don't have to care about this unless you feel like it if you want to play with making your code faster you can do that and by adding this function this runs about nine times faster which is really cool why is it not default excellent question when it's compiled it's harder to debug so if we get an error now it's gonna be funky really good question so what you do is typically you'll write and run your code you'll train your model and make sure it works when you're happy with it you can add TF function when you're debugging here you can use the Python debugger when you're debugging here you get weird stack traces so but it's great you don't always get speed ups this can actually make your code slower because it's still being developed it depends on the type of code you're running yeah great question can you put Python control-flow yes you can so here's just some function I stole this slide from somebody this is out of my area here's Python control flow and yes you can just to say ballpark how this works under the hood we have a giant guide on this it uses something called auto graph and I mentioned that tensor flows this graph execution engine what it does is it's a Python source to source code transformer and it rewrites your Python code you can see what it generates but you never have to care about this it rewrites your Python code to look like this and this code is in a form that the backend can compile and accelerate so there's yeah that's sort of the magic under it typically in practice you don't put this in every function what you usually do I can just show you if I pull up if I just pull up pics to pics you can see how this works in practice yeah you usually just wrap your entire training loop so here's our training loop for the model after you debug it you can just put T a function on the top of the training loop but your mileage may vary but looks super super cool that's T a function all right so I will stop there let me give you a slide with learning resources by the way went fast but what you can do is for all the links you can try them at home and basically the best way to learn it is to play with the code and just modify it as a slide all screwed up alright so learning books and then it was out books all right so all the tutorials want our attention floated org slash alpha possibly slash beta tomorrow who knows the best book if you want a non-academic book that teaches you how to do stuff the best book in my opinion is deep learning with Python it's like 40 bucks it's written in a library called chaos but tensorflow 2 is based on chaos so all that code will work in tensor flow too just by changing and import no other changes which is great and then there's two courses one from MIT and one from Stanford yeah so thanks very much and then [Applause]
Info
Channel: GDGAnnArbor
Views: 6,618
Rating: 4.9210525 out of 5
Keywords: TensorFlow, Machine Learning, Ai, Google
Id: eysk3Keduxk
Channel Id: undefined
Length: 107min 24sec (6444 seconds)
Published: Sat Jun 15 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.