Irene Chen A Beginner's Guide to Deep Learning PyCon 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
(host) I'm excited about hearing your talk. (Irene Chen) Oh yeah? It's going to be very beginner, which I think... I've been to a lot of these talks and I think sometimes you just, like, dive right in and people get very overwhelmed, so... (host) That's OK by me. I'm a developer that works with a bunch of data scientists, so I need the beginner level to understand what they're saying, so that's great. All right, I guess it's 4:30. We can begin. (Irene Chen) Why don't we wait for that crowd to -- (host) Oh, yeah. (Irene Chen) I just don't want people coming in and out. (host) Good afternoon, everyone. I'd like to introduce Irene Chen, who will be talking about a beginner's guide to deep learning. (Irene Chen) All right, hi. [applause] Good afternoon, everyone. Oh that is loud. OK. Hi, my name is Irene. I am currently a data scientist at Dropbox, and today we're going to be talking about deep learning, specifically a beginner's guide to deep learning, that is, emphasis on the "beginner," and obviously emphasis on the "deep learning" part. So this looks like a pretty bright crowd. Raise your hand if you've ever Googled or Binged or DuckDuckGo'd deep learning in an effort to teach yourself more about deep learning. So, oh wow, that looks like almost everyone in this room. Great. Good work, team. If you've taken a look on the internet, you may have seen things about convolutional nets, back propagation, image recognition, restricted Boltzmann machines, so, very technical jargon, which may be intimidating. Or if you follow tech news you may have come across such headlines as DeepMind's AlphaGo beating professional go player Lee Sedol recently, NVIDIA and its latest GPU architecture, Toyota had a $1 billion AI investment, and I recently read an article about how Facebook is building AI that builds AI, so that's pretty cool. Or if you come from a more academic setting, maybe you've read works by these so-called on deep learning pioneers, sort of the people who are really setting the stone of what we're doing right now - people like Geoff Hinton, Yann LeCun, Andrew Ng, and Yoshua Bengio. So there's no doubt that the information on the internet runs very deep, but - ha ha! - but at times it can be very overwhelming. So a simple search for deep learning can yield over 13 million pages, and when I was first starting out, I read a lot of these guides myself. And having browsed many of these guides, I have found that some of the guides can have too much math and some of the guides can have too much code. So that's just to set it all in place. So what are we going to do today, though? Well, today we will have some math and we will have some code. The goal, however, is to give you a foundation to better equip you so that if you want to dive into the technical side of deep learning on your own, if you want to leave this 30-minute talk and figure out more things on your own, you'll have -- you will have at least seen the bird's-eye view. So we'll start with the basic question of, why now? Neural networks have actually been in existence since around the 1970s, so why the resurgence now? Then we'll hone into what exactly a neural network is, how does it relate to deep learning, what is -- how does it work? What are these circles and arrows you might have seen? What do they represent? And the last thing we'll do is we'll get our feet wet a little bit with some IPython Notebook coding with Caffe, which is a popular computer vision library. All right, let's jump right in. So this is a cartoon I drew for a Pictionary game. The term was "machine learning" and this is what I came up with. So, fundamentally, machine learning is all about making computers as smart as humans. We are marching towards leveraging the efficient computation of a computer into something that can actually learn and make decisions about the world. Deep learning is a relatively new branch of machine learning that has been able to achieve superior results using much, much more data. Andrew Ng, one of the people that I flashed on the slide earlier, has a great analogy that I like, where he compares deep learning to a rocket ship. So in order to get very far, a rocket ship needs two things. One it needs a very powerful, very large engine, and two, it needs a massive amount of fuel. So in this extended metaphor, the more and more sophisticated neural networks that we have today could be considered the engine, and the massive amount of data we have access to, on the order of terabytes and above, are the fuel. So if you have a big engine but no fuel, the rocket will not go very far. And if you have a large amount of fuel and a very, very small engine or no engine, you'll end up grounded as well. So today we will actually be focusing on the first one, the engine. But it's worth noting that most advances in deep learning are equally a result of massive training data sets as well as the neural networks. As a recent example, AlphaGo recently beat one of the strongest professional go players in the history of the game by analyzing and training on a data set of about tens of millions of games by expert go players. So for those of you that don't know, go is a board game involving black and white stones. For a long time, people thought it would be the final frontier, that no one would -- no computer would ever beat a human at go; it just was not possible. And it turns out, by cranking through a lot of data and having a lot of GPUs, you can in fact do it. So that's -- what an exciting time we live in right now. The fact is that we are seeing so many advances in deep learning right now because we have this perfect storm. We have the ability to capture and store a lot of data, and we are making increased advances in the technology of neural networks. So let's keep it simple. One of the most common types of machine learning algorithms is called a classifier. For the purposes of right now, we can think of it as a black box or maybe an orange box, because it shows up a little bit better on the screen. As the name would suggest, a classifier takes in some input and gives scores so that the output would be one of the classes, one of the two or more classes it's been asked to label this input with. So for example, let's look at an avocado. I live in San Francisco. Once a week I eat a -- over the course of a week I eat maybe ten avocados. I really like avocados. One problem with avocados is that it's actually kind of hard to tell a perfectly ripe avocado. So, what if we could build a classifier for this? So, given an avocado with a certain size, maybe a measure of the squishiness of an avocado and the RGB value for the color of the skin, can we predict if an avocado is in fact ripe? Furthermore, if we had data from many, many more avocados, could we learn even better? Could we have an even more accurate model? Classifiers use all this training data to train a classifier -- that is, to make themselves better and more accurate. So, once they have all this extra data, they can make even better predictions about the model's task -- here, again, deciding if it's a ripe avocado. Traditionally, machine learning has a lot of different tools that we would use to solve this problem that are not deep learning. So you may have heard of things like a logistic regression, naive Bayes, support vector machine, k-nearest neighbors, random forests. These are all things that are excellent tools that are not deep learning. And they all work in a very similar way. You take in the input data, so maybe the last 1,000 avocados that I've I bought, whether or not they were ripe, how much they weighed, what color they were, how squishy they were. We train the classifier, and then for any new avocado from now on, we can predict if it is in fact ripe. Compared to some of the other classifiers, deep learning takes a very long time to train. So for a simple example like my avocado problem, it might not be the best tool. As the dimensions of the input data grow, though, and the complexity of the patterns that we're trying to detect increase, deep nets become more and more important. So imagine we have a face, and we want to figure out who it is. All of a sudden the input data is actually the RGB values of each pixel of this image, and your training set is probably millions of photos of various people. Moreover, the number of classes that you're trying to predict has also increased. Instead of predicting if an avocado is ripe or not -- so that's one, two classes -- you're trying to predict if this face is actually me, Irene Chen, or if it's Ellen Degeneres or if it's Jennifer Lawrence or someone else altogether. That's a lot of data. Deep nets quickly become the only tool that can handle such large data sets in a reasonable fashion. So that brings us to our first takeaway. This is the best time ever for deep learning in the history yet because of a mass amount of data, a mass amount of processing power, and these robust neural networks. All right. But let's take a step back. Our goal is to make computers as smart as humans. So what makes computers -- what makes humans so smart? In humans, the cells that make us think are called neurons. They talk to each other using electrical impulses through links called synapses. This entire nervous system allows us to reason, have consciousness, allows you to sit in your seat and listen to me or not listen to me, and it allows me to talk at you. Computer scientists have been trying to model this complex nervous system using a more simplistic model, and we call that a neural network. It's a form of a graph. Deep learning is all about neural networks. A neural network's function, similar to the classifiers we learned about, is to process a set of inputs, perform a series of increasingly complex and complicated calculations, and then use the outputs to solve the desired problem. We can use the nodes, similar to the -- we can use a concept called nodes to represent the neurons and we can use edges to represent the synapses in the brain. So, here we have a graph. If we trigger a node, so we sort of give it some sort of input, it then triggers the nodes that it's connected to and so forth until it affects the entire graph. Because we are computer scientists, we want to organize things and give it a little structure, so we can create dedicated input and output nodes. We throw some extra nodes in the middle for fun, and we go ahead and add some directed edges to control the flow of information. So, each connection can have different values in order to decrease or increase the importance of the connection between these two nodes. If the weight is too low, maybe there is no edge at all. So here we represent the weights using thickness of arrows. I'm not quite sure how well it rendered but some arrows are much thicker and some arrows are very, very thin. If we look at node A, which is red, and node B, which is blue, we can see that they're both feeding into node C, which is now purple. Because the weight of the edge between B and C is heavier than the weight of the edge between A and B, we could say that maybe it's a little more blue than red but it's still purple as a combination of the inputs. Mathematically, we represent this by using a decision function in order to weigh the inputs and decide which value to output using what's called a sigmoid function. But back to the graph, once a node computes its own value, it feeds information forward in the next layer and so forth until the output nodes have their own values. So here we have C, and C will influence D, and if you notice, D has also been drawn from this other node that I did not give a letter. The layers, you'll notice, include the input nodes and the output nodes and everything in between, which we call the hidden layers, which is a shame, because they do most of the work and receive none of the glory. Neural networks are traditionally used for classification problems. So a reminder again, classification problems: inputs feeds forward outputs. So let's bring back my handy-dandy avocado example. So, given, again, the height, the squishiness, and the RGB value of the skin of the avocado, is it ripe? So the neural network in this case would read the input data. I've made up some fake numbers representing the variables I just outlined, measurements from this avocado. And it would calculate the value along the wei-- based on the weights of the directed edges until it grows layer by layer, until we get output values. This process is known as forward propagation. This is the algorithm for feeding input data, calculating the values, going along the weights until we get to the output values. This allows us to calculate the likelihood, interpreting the output values to determine if the avocado is indeed ripe. It's worth noting that no node fires randomly. It's all deterministic. That is, if you give the exact same input again, you'll get the exact same output. So, no randomness. But here we have a bigger question. How did we get those weights in the first place? It sort of -- we just took them for granted. So we've got our neural network and we've got these input values. We want weights, though, the weights of the edges and biases of the whole neural network to be such that the accuracy is very high. And what do we mean by accuracy? So this is where training data comes in. Or, should I say, training avocados? Training data is simply the existing labeled data of previous avocados that we have examined. So some are big, some are small, some are ripe, some are not ripe. The more data the better. Remember the rocket ship example? The weights are then decided using an algorithm called backwards propagation or back propagation if you like tongue twisters. This selects the weight for the neural network that minimizes the error in the network given the existing data. So, given all the avocados that we have access to, how can we minimize the error? So, what do we mean by error? So let's take one avocado. And we start with some random weights in the network. So we said they are all 1 because we're lazy. And we add the input avocado measurements for this one avocado. Again, we forward propagate using our random selected weights, forward forward forward, and we get some output nodes. So remember that these are based on a random weight, so they're not necessarily correct. But we actually know what the output should be since this is a known avocado, so we can compare the results. And it turns out our output is wrong. So we calculated it should be 4 and 20. These numbers are made up. Don't read too much into them. But it should actually be 5 and 19. So it was 4 and 20, now it's 5 and 19. Our error values are then how far off our model predicted from the examples that we have already observed. We use these formulas to backwards propagate the errors. So we want to adjust our weights based on what we've learned, but only by a small amount. So this is a lot of math, and I think it's actually quite small text. But the big picture is actually that the formulas depend on the values of the nodes, the amount of the error, the weights of the edges, and the learning rate. The learning rate determines how much we adjust based on each error. So this is essentially our step size. If we have too big of a step size, we might sail past the right answer, the optimal weights, and actually the wrong answer. If we have too small of a learning rate, we actually might never get there. We will just get stuck and progress very, very slowly until we all get bored. So we have to decide how much to adjust our weights so that we can learn from these errors. And we actually push them back from the output nodes layer by layer by layer until we arrive back at the input layer, updating the weights as we go. Now that we have new weights, we can begin again with a forward propagation, and we continue so with all of our known avocados for as many iterations as we can stand or until we've determined other criteria for stopping. So here is a graph that means very little except that it's going down. So the x-axis is the number of iterations and the y-axis is error, so down is better. As you can see, the more iterations we have, the more the error goes down. But it doesn't necessarily go down smoothly, it's not necessarily linear, and it's not necessarily, you know, the more iterations the better. In fact, at a certain point we could probably say, "That's enough. "It seems like that's about as good as we're going to get." And we call this convergence. So we can define convergence as when it is not -- it is differing from its previous versions by a certain amount. Or we can say that there is a certain number of max iterations, because life is only so short and we don't want to sit there waiting for our thing to train forever. We say the algorithm is complete and our neural network has been trained on the data. This might take a while, a very long time. And one last setup -- we actually can vary some other things. So the learning rate, as I mentioned before, can be tuned. We can try things with bigger learning rates, smaller learning rates. We can also vary the number of nodes we have or the number of layers, the number of hidden layers, given that our input and output layers are still fixed. This is called tuning the parameters of our neural network. But ultimately, we will arrive at a neural network that is trained with our weights and biases that minimizes error in the entire training data set. Time to test it out on some real avocados! So our big takeaway from here is that neural networks can be trained on labeled data and then classify new avocados. It can be applied to other things besides avocados as well. For example, given a person's height, weight, temperature, can we predict if they are sick? Or, given the temperature -- the weather outside and the current stock price, can we predict if the stock price will go up or down tomorrow? There's a lot of applications. Why don't you figure out what you can try them on? All right, so that brings us to the third section. I'm going to get some water. All right, so now we know what neural networks look like as colored circles on a page; let's see what they look like in code. This is, after all, PyCon. So, we have computers. You do not have to hand-compute weights and errors for back propagation. Additionally, even though deep learning is a very young field still, there are quite a few helpful Python libraries, so you don't have to reinvent the neural network. Specifically we're going to talk about four Python libraries that I have found very helpful. Some of you may be familiar with Scikit-learn, Caffe, Theano, or IPython Notebook. Scikit-learn is a very well-documented machine learning library for all of your ML needs. It's a great place to get started. There are implementations for almost any ML algorithm you can think of, ML meaning machine learning. It's very beginner-friendly, there are a lot of examples, and there are also some functions for non-specifically machine learning things but sort of data cleaning, how to graph things, very good support. Caffe is a computer vision deep learning, meaning that it's related to images. It comes out of the UC Berkeley Vision Group, and there are wrappers for Python and C++. And the best thing about Caffe is actually a thing called the zoo, which is the group of pre-trained models. Instead of training them yourself, which could take forever, especially for computer vision, deep nets, you can have pre-trained models. For Theano, this covers efficient GPU-powered math. So if you want to implement your own functions, if you want to code up your own neural network, Theano provides you a way so that you can make your algorithms as efficient as possible. And lastly, IPython Notebook is great for interactive coding. I think there are some other talks about IPython Notebook further at this conference, but I'll go ahead and plug the notebook as a great way to show your work. It's a great way to profile your code since a lot of machine learning algorithms can take a long time on a few steps so you don't want to rerun everything. I think it's known as Jupiter now, and it handles other languages, but old habits die hard so I always refer to it as IPython Notebook still. All right, so, all four libraries, I would highly encourage you to check them out. But we're here to load a pre-trained network into Caffe and use it to classify a picture. We have only 30 minutes for this talk. So we could have watched me train a network for all 30 minutes, but that would be less fun. Word of warning that Caffe takes a while to install. There's a few tricky things. When I was younger and more foolish, it took me about a week to install Caffe. I was debugging the whole time. And so thankfully Caffe has these pre-trained nets. So between the week I spent debugging the Caffe installation and the time I saved not having to retrain a net, I'd probably say it's about a break even. So the model we're looking at today is trained off a subset of the ImageNet database. So one of the first things I learned about machine learning researchers is that they love contests. Raise your hand if you've heard of Kaggle. So, OK, so about maybe half, maybe a third of the audience. So Kaggle is a machine learning sort of cooperative casual platform. There are some contests with cash prizes. There are forums that are very active. The computer vision folks have a similar type challenge called the, let's see, ILSVRC Challenge, which stands for the ImageNet Large Scale Visual Recognition Challenge, where they see who can put classifications on images as efficiently. The dataset they're going off of is the image net data set and it has 10 million images, 10,000 object classes, so cats, dogs, foods, anything you can think of. And it's already -- this net that we're working with has already been pre-trained on 310,000 iterations. So first we'll import the essential packages from Caffe. I think it's fairly small on the screen. Next we'll load our pre-trained network from disk. So when I say we load our network, I'm meaning that we -- our network consists of the weights that we talked about earlier, the number of the nodes, the structure of the layers, anything like -- and other parameters that have been already tuned. And we have selected this picture to classify. So just to make sure there are no robots in the room, raise your hand if you think this is a mailbox. [laughter] Ooh, I see a few people in the back who think this is a mailbox. Uhh. And raise your hand if you think this is a cat. So most people think this is a cat. Great. So we're wondering if our pre-trained deep net can identify this as -- correctly as a cat. It's worth pointing out that this image is not in the training set. This is a completely new image. So we run it and we determine that the label -- I just sort of increased the size of the text -- that it comes out with is "tabby cat." I am actually not a cat expert so I'm not quite sure if it is a tabby cat, but it does seem plausible. And so just as a check, we went in and got the top five most likely labels in addition to the tabby cat, so we have tiger cat, Egyptian cat, red fox and a lynx, which are all some sort of feline, maybe -- I think the fox is not feline, but it's related things. So -- and I think it checks out. Unfortunately, now that I know it's so easy for a computer to label a cat, I actually have no idea if you all are humans because robots can do it so quickly now. From here, we can add additional training to this, to this pre-trained net, and we can train for more depending on what we're looking for, or we can use this ready-made. Similar to the analogy I think of is when you buy premade pie crust and you use it to make your own pie because you didn't want to make the pie crust yourself. Yeah, so it's just that simple. We can load and use models using Caffe to jumpstart our learning. Note that "learning" here can refer to the deep net learning or it also can refer to your own personal edification. So, three lessons today, just to rehash. Deep learning is super hot right now because of the access to the data, the processing power, and these robust neural networks. Neural networks themselves, if we zoom in, can be trained on data to classify avocados and other things. And Caffe is one way to load pre-trained models to jumpstart the data. But that's enough about me. Where do YOU go from here? So for this presentation we've abstracted away a lot of the extra details to get to the heart of deep learning. If you want to pick -- if you any of you want to go further, you have to pick where you want to dive a little deeper. So of the three lessons, raise your hand if you are most interested in the things we talked about in the first lesson. No one. One person over -- two people over there. Great. So it's possible that you're more of a systems person. There's some great work going on about how to scale existing databases, how to handle not only the massive amount of data needed to train these deep nets, but also how do you perform the computation efficiently and effectively for both processing time and engineer time. I would recommend you check out the CUDA implementations for neural networks, and also some of the other packages I mentioned, including Theano, Google has an open source library called TensorFlow, and similar packages. So raise your hand if you're most interested in the second lesson, the neural network part. Ooh, much more. Maybe like half, 60%. Great. So maybe it seems like you might be a more theoretical person. Maybe you even wanted to see more math. If you want to learn how deep learning is adapted for different kinds of problems, you should check out the different ways neural networks can be stacked on top of each other or twisted to fit different problems. So in addition to the classification problem that we talked about, there's unlabeled problems, unlabeled sort of deep learning problems as well. So for that you should check out what a restricted Boltzmann machine is. So that's for unlabeled data and detecting patterns there. For text processing you could look at recurrent networks. And for image processing, so anything with pictures, you could check out convolutional networks. And then last but not least, if you liked lesson three the most with Caffe, raise your hand. Ooh, a fair amount, maybe like 20 hands I saw. So if you want to hack something together, or maybe because you like -- you learn best by just coding it up in Python or whatever your language of choice would be, hopefully Python, Caffe is a great place to start loading pre-trained nets. There's a whole series of IPython Notebooks on there. Feel free to grab a net, and maybe also compete in a Kaggle competition. They have actually a really good one about how to tell the difference between cats and dogs. It seems like most of you can detect a cat, so maybe we can figure out how to train a similar net to detect a dog. The forums are also very supportive and some of the competitions even have a cash prize. So the bottom line is there's a lot to dig into. Beyond the more headline-grabbing news items, deep learning has applications that could and are already changing the world. A big area of impact already is accessibility. So for the visually impaired to be able to read a sign, go grocery shopping, or complete everyday tasks, deep learning has been invaluable. Handwriting recognition has allowed us to digitize a lot of historical documents to learn from the past. And even the, you know, very exciting self-driving cars could dramatically reduce the number of fatal accidents, traffic accidents at least. So deep learning is not new, at least the ideas are not new, but it's a very young field as it is now. The people I've found are very supportive, and it's a growing field with plenty of opportunities, so don't be afraid to jump right in. Thank you so much. My email address is irenetrampoline@gmail.com, so feel free to email me with any questions. Thank you. [applause] I think we are out of time so I'll just be outside if you have any questions. [applause]
Info
Channel: PyCon 2016
Views: 21,295
Rating: undefined out of 5
Keywords:
Id: nCPf8zDJ0d0
Channel Id: undefined
Length: 30min 54sec (1854 seconds)
Published: Fri Jun 17 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.