(host)
I'm excited about hearing your talk. (Irene Chen)
Oh yeah? It's going to be very beginner, which I think...
I've been to a lot of these talks and I think sometimes you just, like, dive right in
and people get very overwhelmed, so... (host)
That's OK by me. I'm a developer that works with a bunch of data scientists, so I need the beginner level to understand
what they're saying, so that's great. All right, I guess it's 4:30. We can begin. (Irene Chen)
Why don't we wait for that crowd to -- (host)
Oh, yeah. (Irene Chen)
I just don't want people coming in and out. (host)
Good afternoon, everyone. I'd like to introduce Irene Chen, who will be talking about
a beginner's guide to deep learning. (Irene Chen)
All right, hi. [applause] Good afternoon, everyone.
Oh that is loud. OK. Hi, my name is Irene. I am currently a data scientist at Dropbox, and today we're going to be
talking about deep learning, specifically a beginner's guide
to deep learning, that is, emphasis on the "beginner," and obviously emphasis on
the "deep learning" part. So this looks like a pretty bright crowd. Raise your hand if you've ever Googled
or Binged or DuckDuckGo'd deep learning in an effort to teach yourself
more about deep learning. So, oh wow, that looks like
almost everyone in this room. Great. Good work, team. If you've taken a look on the internet,
you may have seen things about convolutional nets,
back propagation, image recognition,
restricted Boltzmann machines, so, very technical jargon,
which may be intimidating. Or if you follow tech news
you may have come across such headlines as DeepMind's AlphaGo beating
professional go player Lee Sedol recently, NVIDIA and its latest GPU architecture, Toyota had a $1 billion AI investment, and I recently read an article about how
Facebook is building AI that builds AI, so that's pretty cool. Or if you come from
a more academic setting, maybe you've read works by these
so-called on deep learning pioneers, sort of the people who are really setting
the stone of what we're doing right now - people like Geoff Hinton, Yann LeCun,
Andrew Ng, and Yoshua Bengio. So there's no doubt that the information
on the internet runs very deep, but - ha ha! - but at times
it can be very overwhelming. So a simple search for deep learning
can yield over 13 million pages, and when I was first starting out,
I read a lot of these guides myself. And having browsed many of these guides, I have found that some of the guides
can have too much math and some of the guides
can have too much code. So that's just to set it all in place. So what are we going to do today, though? Well, today we will have some math
and we will have some code. The goal, however, is to give you
a foundation to better equip you so that if you want to dive into the technical
side of deep learning on your own, if you want to leave this 30-minute talk
and figure out more things on your own, you'll have -- you will have at least
seen the bird's-eye view. So we'll start with
the basic question of, why now? Neural networks have actually been
in existence since around the 1970s, so why the resurgence now? Then we'll hone into what exactly
a neural network is, how does it relate to deep learning,
what is -- how does it work? What are these circles and arrows
you might have seen? What do they represent? And the last thing we'll do is
we'll get our feet wet a little bit with some IPython Notebook
coding with Caffe, which is a popular
computer vision library. All right, let's jump right in. So this is a cartoon I drew
for a Pictionary game. The term was "machine learning"
and this is what I came up with. So, fundamentally,
machine learning is all about making computers
as smart as humans. We are marching towards leveraging
the efficient computation of a computer into something that can actually learn
and make decisions about the world. Deep learning is a relatively new
branch of machine learning that has been able to achieve
superior results using much, much more data. Andrew Ng, one of the people
that I flashed on the slide earlier, has a great analogy that I like, where he compares deep learning
to a rocket ship. So in order to get very far,
a rocket ship needs two things. One it needs a very powerful,
very large engine, and two, it needs
a massive amount of fuel. So in this extended metaphor, the more and more sophisticated
neural networks that we have today could be considered the engine, and the massive amount of data
we have access to, on the order of terabytes
and above, are the fuel. So if you have a big engine but no fuel,
the rocket will not go very far. And if you have a large amount of fuel
and a very, very small engine or no engine, you'll end up grounded as well. So today we will actually be
focusing on the first one, the engine. But it's worth noting
that most advances in deep learning are equally a result
of massive training data sets as well as the neural networks. As a recent example, AlphaGo recently beat
one of the strongest professional go players in the history of the game
by analyzing and training on a data set of about tens of millions of games
by expert go players. So for those of you that don't know,
go is a board game involving black and white stones. For a long time, people thought
it would be the final frontier, that no one would -- no computer would ever
beat a human at go; it just was not possible. And it turns out,
by cranking through a lot of data and having a lot of GPUs,
you can in fact do it. So that's -- what an exciting time
we live in right now. The fact is that we are seeing
so many advances in deep learning right now
because we have this perfect storm. We have the ability to capture
and store a lot of data, and we are making increased advances
in the technology of neural networks. So let's keep it simple. One of the most common types of machine
learning algorithms is called a classifier. For the purposes of right now,
we can think of it as a black box or maybe an orange box, because it shows up
a little bit better on the screen. As the name would suggest,
a classifier takes in some input and gives scores so that the output
would be one of the classes, one of the two or more classes
it's been asked to label this input with. So for example,
let's look at an avocado. I live in San Francisco. Once a week I eat a -- over the course of a week
I eat maybe ten avocados. I really like avocados. One problem with avocados is that
it's actually kind of hard to tell a perfectly ripe avocado. So, what if we could build
a classifier for this? So, given an avocado
with a certain size, maybe a measure of the
squishiness of an avocado and the RGB value
for the color of the skin, can we predict if an avocado
is in fact ripe? Furthermore, if we had data
from many, many more avocados, could we learn even better? Could we have
an even more accurate model? Classifiers use all this training data
to train a classifier -- that is, to make themselves
better and more accurate. So, once they have all this extra data, they can make even better predictions
about the model's task -- here, again, deciding
if it's a ripe avocado. Traditionally, machine learning
has a lot of different tools that we would use to solve this problem
that are not deep learning. So you may have heard of things
like a logistic regression, naive Bayes, support vector machine,
k-nearest neighbors, random forests. These are all things that are excellent tools
that are not deep learning. And they all work in a very similar way. You take in the input data, so maybe the last 1,000 avocados
that I've I bought, whether or not they were ripe,
how much they weighed, what color they were,
how squishy they were. We train the classifier, and then
for any new avocado from now on, we can predict if it is in fact ripe. Compared to some of
the other classifiers, deep learning takes
a very long time to train. So for a simple example
like my avocado problem, it might not be the best tool. As the dimensions of the
input data grow, though, and the complexity of the patterns
that we're trying to detect increase, deep nets become
more and more important. So imagine we have a face,
and we want to figure out who it is. All of a sudden the input data
is actually the RGB values of each pixel of this image, and your training set is probably
millions of photos of various people. Moreover, the number of classes that you're
trying to predict has also increased. Instead of predicting
if an avocado is ripe or not -- so that's one, two classes -- you're trying to predict if this face
is actually me, Irene Chen, or if it's Ellen Degeneres
or if it's Jennifer Lawrence or someone else altogether. That's a lot of data. Deep nets quickly become the only tool that can handle such large data sets
in a reasonable fashion. So that brings us
to our first takeaway. This is the best time ever
for deep learning in the history yet because of a mass amount of data,
a mass amount of processing power, and these robust neural networks. All right. But let's take a step back. Our goal is to make computers
as smart as humans. So what makes computers --
what makes humans so smart? In humans, the cells that make us think
are called neurons. They talk to each other
using electrical impulses through links called synapses. This entire nervous system
allows us to reason, have consciousness, allows you to sit in your seat
and listen to me or not listen to me, and it allows me to talk at you. Computer scientists have been trying
to model this complex nervous system using a more simplistic model,
and we call that a neural network. It's a form of a graph. Deep learning is all about
neural networks. A neural network's function, similar to
the classifiers we learned about, is to process a set of inputs, perform a series of increasingly complex
and complicated calculations, and then use the outputs
to solve the desired problem. We can use the nodes, similar to the -- we can use a concept called nodes
to represent the neurons and we can use edges to represent
the synapses in the brain. So, here we have a graph. If we trigger a node, so we sort of
give it some sort of input, it then triggers the nodes
that it's connected to and so forth until it
affects the entire graph. Because we are computer scientists, we want to organize things
and give it a little structure, so we can create dedicated
input and output nodes. We throw some extra nodes
in the middle for fun, and we go ahead and add
some directed edges to control the flow
of information. So, each connection
can have different values in order to decrease
or increase the importance of the connection
between these two nodes. If the weight is too low,
maybe there is no edge at all. So here we represent the weights
using thickness of arrows. I'm not quite sure
how well it rendered but some arrows are much thicker
and some arrows are very, very thin. If we look at node A, which is red,
and node B, which is blue, we can see that they're both feeding
into node C, which is now purple. Because the weight of the edge
between B and C is heavier than the weight
of the edge between A and B, we could say that maybe
it's a little more blue than red but it's still purple
as a combination of the inputs. Mathematically, we represent this
by using a decision function in order to weigh the inputs
and decide which value to output using what's called
a sigmoid function. But back to the graph,
once a node computes its own value, it feeds information forward
in the next layer and so forth until the output nodes
have their own values. So here we have C,
and C will influence D, and if you notice,
D has also been drawn from this other node
that I did not give a letter. The layers, you'll notice, include
the input nodes and the output nodes and everything in between,
which we call the hidden layers, which is a shame,
because they do most of the work and receive none of the glory. Neural networks are traditionally used
for classification problems. So a reminder again, classification problems:
inputs feeds forward outputs. So let's bring back
my handy-dandy avocado example. So, given, again, the height,
the squishiness, and the RGB value of the skin
of the avocado, is it ripe? So the neural network in this case
would read the input data. I've made up some fake numbers
representing the variables I just outlined,
measurements from this avocado. And it would calculate
the value along the wei-- based on the weights
of the directed edges until it grows layer by layer,
until we get output values. This process is known as
forward propagation. This is the algorithm
for feeding input data, calculating the values,
going along the weights until we get to the output values. This allows us to calculate
the likelihood, interpreting the output values to determine if the avocado
is indeed ripe. It's worth noting that no node
fires randomly. It's all deterministic. That is, if you give
the exact same input again, you'll get the exact same output. So, no randomness. But here we have a bigger question. How did we get those weights
in the first place? It sort of --
we just took them for granted. So we've got our neural network
and we've got these input values. We want weights, though, the weights of the edges
and biases of the whole neural network to be such that
the accuracy is very high. And what do we mean by accuracy? So this is where
training data comes in. Or, should I say,
training avocados? Training data is simply
the existing labeled data of previous avocados
that we have examined. So some are big, some are small,
some are ripe, some are not ripe. The more data the better.
Remember the rocket ship example? The weights are then decided using
an algorithm called backwards propagation or back propagation
if you like tongue twisters. This selects the weight
for the neural network that minimizes the error in the network
given the existing data. So, given all the avocados
that we have access to, how can we minimize the error? So, what do we mean by error? So let's take one avocado. And we start with some
random weights in the network. So we said they are all 1
because we're lazy. And we add the input avocado
measurements for this one avocado. Again, we forward propagate
using our random selected weights, forward forward forward,
and we get some output nodes. So remember that these are
based on a random weight, so they're not necessarily correct. But we actually know
what the output should be since this is a known avocado,
so we can compare the results. And it turns out
our output is wrong. So we calculated
it should be 4 and 20. These numbers are made up.
Don't read too much into them. But it should actually
be 5 and 19. So it was 4 and 20,
now it's 5 and 19. Our error values are then
how far off our model predicted from the examples
that we have already observed. We use these formulas
to backwards propagate the errors. So we want to adjust our weights based on what we've learned,
but only by a small amount. So this is a lot of math, and I think
it's actually quite small text. But the big picture is actually
that the formulas depend on the values of the nodes,
the amount of the error, the weights of the edges,
and the learning rate. The learning rate determines
how much we adjust based on each error. So this is essentially our step size. If we have too big of a step size,
we might sail past the right answer, the optimal weights,
and actually the wrong answer. If we have too small of a learning rate,
we actually might never get there. We will just get stuck
and progress very, very slowly until we all get bored. So we have to decide
how much to adjust our weights so that we can learn
from these errors. And we actually push them back
from the output nodes layer by layer by layer until we arrive
back at the input layer, updating the weights as we go. Now that we have new weights,
we can begin again with a forward propagation,
and we continue so with all of our known avocados
for as many iterations as we can stand or until we've determined
other criteria for stopping. So here is a graph
that means very little except that it's going down. So the x-axis
is the number of iterations and the y-axis is error,
so down is better. As you can see,
the more iterations we have, the more the error goes down. But it doesn't necessarily
go down smoothly, it's not necessarily linear, and it's not necessarily, you know,
the more iterations the better. In fact, at a certain point
we could probably say, "That's enough. "It seems like that's about
as good as we're going to get." And we call this convergence. So we can define convergence
as when it is not -- it is differing from its previous
versions by a certain amount. Or we can say that there is
a certain number of max iterations, because life is only so short
and we don't want to sit there waiting for our thing
to train forever. We say the algorithm is complete and our neural network
has been trained on the data. This might take a while,
a very long time. And one last setup -- we actually
can vary some other things. So the learning rate,
as I mentioned before, can be tuned. We can try things with bigger
learning rates, smaller learning rates. We can also vary
the number of nodes we have or the number of layers,
the number of hidden layers, given that our input and output
layers are still fixed. This is called tuning the parameters
of our neural network. But ultimately, we will arrive
at a neural network that is trained
with our weights and biases that minimizes error
in the entire training data set. Time to test it out
on some real avocados! So our big takeaway from here
is that neural networks can be trained on labeled data
and then classify new avocados. It can be applied to other things
besides avocados as well. For example, given a person's
height, weight, temperature, can we predict
if they are sick? Or, given the temperature --
the weather outside and the current stock price,
can we predict if the stock price
will go up or down tomorrow? There's a lot of applications. Why don't you figure out
what you can try them on? All right, so that brings us
to the third section. I'm going to get some water. All right, so now we know
what neural networks look like as colored circles on a page;
let's see what they look like in code. This is, after all, PyCon. So, we have computers. You do not have to hand-compute
weights and errors for back propagation. Additionally, even though deep learning
is a very young field still, there are quite a few
helpful Python libraries, so you don't have to reinvent
the neural network. Specifically we're going to talk
about four Python libraries that I have found very helpful. Some of you may be familiar
with Scikit-learn, Caffe, Theano, or IPython Notebook. Scikit-learn is a very well-documented
machine learning library for all of your ML needs.
It's a great place to get started. There are implementations for almost
any ML algorithm you can think of, ML meaning machine learning. It's very beginner-friendly,
there are a lot of examples, and there are also some functions
for non-specifically machine learning things but sort of data cleaning,
how to graph things, very good support. Caffe is a computer vision
deep learning, meaning that it's
related to images. It comes out of the
UC Berkeley Vision Group, and there are wrappers
for Python and C++. And the best thing about Caffe
is actually a thing called the zoo, which is the group
of pre-trained models. Instead of training them yourself,
which could take forever, especially for computer vision,
deep nets, you can have pre-trained models. For Theano, this covers
efficient GPU-powered math. So if you want to implement
your own functions, if you want to code up
your own neural network, Theano provides you a way
so that you can make your algorithms as efficient as possible. And lastly, IPython Notebook
is great for interactive coding. I think there are some other talks
about IPython Notebook further at this conference,
but I'll go ahead and plug the notebook as a great way to show your work. It's a great way to profile your code since a lot of machine
learning algorithms can take a long time on a few steps
so you don't want to rerun everything. I think it's known as Jupiter now,
and it handles other languages, but old habits die hard so I always
refer to it as IPython Notebook still. All right, so, all four libraries, I would highly
encourage you to check them out. But we're here to load
a pre-trained network into Caffe and use it to classify a picture. We have only 30 minutes
for this talk. So we could have watched me
train a network for all 30 minutes, but that would be less fun. Word of warning that Caffe
takes a while to install. There's a few tricky things. When I was younger and more foolish,
it took me about a week to install Caffe. I was debugging the whole time. And so thankfully Caffe has
these pre-trained nets. So between the week I spent
debugging the Caffe installation and the time I saved
not having to retrain a net, I'd probably say
it's about a break even. So the model we're looking at today is trained off a subset
of the ImageNet database. So one of the first things I learned
about machine learning researchers is that they love contests. Raise your hand
if you've heard of Kaggle. So, OK, so about maybe half,
maybe a third of the audience. So Kaggle is a machine learning
sort of cooperative casual platform. There are some contests with cash prizes. There are forums that are very active. The computer vision folks
have a similar type challenge called the, let's see,
ILSVRC Challenge, which stands for the ImageNet
Large Scale Visual Recognition Challenge, where they see who can put classifications
on images as efficiently. The dataset they're going off of
is the image net data set and it has 10 million images,
10,000 object classes, so cats, dogs, foods,
anything you can think of. And it's already -- this net
that we're working with has already been pre-trained
on 310,000 iterations. So first we'll import
the essential packages from Caffe. I think it's fairly small
on the screen. Next we'll load our pre-trained
network from disk. So when I say we load our network, I'm meaning that we --
our network consists of the weights that we talked about earlier,
the number of the nodes, the structure of the layers,
anything like -- and other parameters
that have been already tuned. And we have selected
this picture to classify. So just to make sure
there are no robots in the room, raise your hand if you think
this is a mailbox. [laughter] Ooh, I see a few people in the back
who think this is a mailbox. Uhh. And raise your hand
if you think this is a cat. So most people think
this is a cat. Great. So we're wondering
if our pre-trained deep net can identify this as --
correctly as a cat. It's worth pointing out that this image
is not in the training set. This is a completely new image. So we run it and we determine
that the label -- I just sort of increased
the size of the text -- that it comes out with is "tabby cat." I am actually not a cat expert
so I'm not quite sure if it is a tabby cat,
but it does seem plausible. And so just as a check, we went in
and got the top five most likely labels in addition to the tabby cat,
so we have tiger cat, Egyptian cat, red fox and a lynx, which are all
some sort of feline, maybe -- I think the fox is not feline,
but it's related things. So -- and I think it checks out. Unfortunately, now that I know it's so easy
for a computer to label a cat, I actually have no idea
if you all are humans because robots can do it
so quickly now. From here, we can add
additional training to this, to this pre-trained net,
and we can train for more depending on what we're looking for,
or we can use this ready-made. Similar to the analogy I think of is when you buy premade pie crust
and you use it to make your own pie because you didn't want to make
the pie crust yourself. Yeah, so it's just that simple. We can load and use models using Caffe
to jumpstart our learning. Note that "learning" here
can refer to the deep net learning or it also can refer
to your own personal edification. So, three lessons today,
just to rehash. Deep learning is super hot right now
because of the access to the data, the processing power,
and these robust neural networks. Neural networks themselves,
if we zoom in, can be trained on data to classify avocados and other things. And Caffe is one way
to load pre-trained models to jumpstart the data. But that's enough about me. Where do YOU go from here? So for this presentation
we've abstracted away a lot of the extra details
to get to the heart of deep learning. If you want to pick --
if you any of you want to go further, you have to pick where you want
to dive a little deeper. So of the three lessons,
raise your hand if you are most interested in the things
we talked about in the first lesson. No one. One person over --
two people over there. Great. So it's possible that you're
more of a systems person. There's some great work going on
about how to scale existing databases, how to handle not only
the massive amount of data needed to train these deep nets, but also how do you perform
the computation efficiently and effectively for both
processing time and engineer time. I would recommend you check out
the CUDA implementations for neural networks, and also some of the other packages
I mentioned, including Theano, Google has an open source library
called TensorFlow, and similar packages. So raise your hand if you're
most interested in the second lesson, the neural network part. Ooh, much more.
Maybe like half, 60%. Great. So maybe it seems like you might be
a more theoretical person. Maybe you even wanted
to see more math. If you want to learn
how deep learning is adapted for different kinds of problems,
you should check out the different ways neural networks can be
stacked on top of each other or twisted to fit
different problems. So in addition to the classification
problem that we talked about, there's unlabeled problems, unlabeled sort of deep learning
problems as well. So for that you should check out
what a restricted Boltzmann machine is. So that's for unlabeled data
and detecting patterns there. For text processing
you could look at recurrent networks. And for image processing,
so anything with pictures, you could check out
convolutional networks. And then last but not least,
if you liked lesson three the most with Caffe, raise your hand. Ooh, a fair amount,
maybe like 20 hands I saw. So if you want to hack something together,
or maybe because you like -- you learn best by just
coding it up in Python or whatever your language of choice
would be, hopefully Python, Caffe is a great place to start
loading pre-trained nets. There's a whole series of
IPython Notebooks on there. Feel free to grab a net, and maybe also
compete in a Kaggle competition. They have actually a really good one
about how to tell the difference between cats and dogs. It seems like most of you
can detect a cat, so maybe we can figure out how to train
a similar net to detect a dog. The forums are also very supportive and some of the competitions
even have a cash prize. So the bottom line is
there's a lot to dig into. Beyond the more headline-grabbing news items, deep learning has applications that could and are already
changing the world. A big area of impact already
is accessibility. So for the visually impaired
to be able to read a sign, go grocery shopping,
or complete everyday tasks, deep learning has been invaluable. Handwriting recognition
has allowed us to digitize a lot of historical
documents to learn from the past. And even the, you know,
very exciting self-driving cars could dramatically reduce
the number of fatal accidents, traffic accidents at least. So deep learning is not new,
at least the ideas are not new, but it's a very young field
as it is now. The people I've found
are very supportive, and it's a growing field
with plenty of opportunities, so don't be afraid to jump right in. Thank you so much. My email address is
irenetrampoline@gmail.com, so feel free to email me
with any questions. Thank you. [applause] I think we are out of time so I'll just be outside
if you have any questions. [applause]