Welcome to Practical Deep Learning for coders,
lesson one. This is version five of this course, and it's the first new one we've done
in two years. So, we've got a lot of cool things to cover! It's amazing how much has
changed. Here is an xkcd from the end of 2015. Who here has seen xkcd comics before?
…Pretty much everybody. Not surprising. So the basic joke here is… I'll let you
read it, and then I'll come back to it. So, it can be hard to tell what's easy and what's
nearly impossible, and in 2015 or at the end of 2015 the idea of checking whether something
is a photo of a bird was considered nearly impossible. So impossible, it was the basic
idea of a joke. Because everybody knows that that's nearly impossible. We're now going to build
exactly that system for free in about two minutes! So, let's build an “is it a bird” system. So, we're going to use python, and I'm
going to run through this really quickly. You're not expected to run through it with me
because we're going to come back to it. But let's go ahead and run that cell. Okay, so what we're doing is we're
searching DuckDuckgo for images of bird photos and we're just going to grab one and so here is the url of
the bird that we grabbed. Okay, we'll download it. Okay, so there it is. So we've grabbed a bird
and so okay we've now got something that can download pictures of birds. Now we're
going to need to build a system that can recognize things that are birds versus
things that aren't birds, from photos. Now of course computers need numbers to work
with, but luckily images are made of numbers. I actually found this really
nice website called pixby where I can grab a bird, and if I wiggle over it
(let's pick its beak) you'll see here that that part of the beak was 251 brightness of red, 48
of green, and 21 of blue. So that's RGB. And so you can see as I wave around, those colors are
changing (those numbers). And so this picture, the thing that we recognize as a picture, is
actually 256 x 171 x 3 numbers, between 0 and 255, representing the amount of red, green and blue
on each pixel. So that's going to be an input to our program, that's going to try and figure
out whether this is a picture of a bird or not. Okay, so let's go ahead and run this cell, which
is going to go through… (and I needed bird and non-bird but you can't really search Google
images or DuckDuckGo images for not a bird, it just doesn't work that way. So I just decided
to use forest - I thought okay pictures of forest versus pictures of birds sounds like a good
starting point.) So I go through each of: forest, and bird. And I search for forest
photo and bird photo, download images, and then resize them to be no bigger than 400
pixels on a side - just because we don't need particularly big ones and it takes a surprisingly
large amount of time just for a computer to open an image. Okay, so we've now got 200 of each.
I find when I download images I often get a few broken ones and if you try and train a model
with broken images it will not work. So here's something which just verifies each image and
unlinks - so deletes the ones that don't work Okay, so now we can create what's called
a data block. So after I run this cell you'll see that I've basically… I'll go through
the details of this later, but… a data block gives fast.ai (the library) all the information
it needs to create a computer vision model. And so in this case we're basically telling it… get
all the image files that we just downloaded. And then we say show me a few
up to six, and let's see… yeah, so we've got some birds, forest, bird,
bird, forest. Okay, so one of the nice things about doing computer vision models is
it's really easy to check your data because you can just look at it - which is not
the case for a lot of kinds of models. Okay, so we've now downloaded 200 pictures
of birds, 200 pictures of forests, so we'll now press run. And this model
is actually running on my laptop, so this is not using a vast data center.
It's running on my presentation laptop. And it's doing it at the same time as my laptop is
streaming video, which is possibly a bad idea. And so what it's going to do is it's going
to run through every photo out of those 400, and for the ones that are forest it's going
to learn a bit more about what forest looks like and for the ones that are bird it'll
learn a bit more about what bird looks like. So overall it took under 30 seconds,
and believe it or not, that's enough to finish doing the thing which was
in that xkcd comic. Let's check by passing in that bird that we downloaded at the
start. This is a bird. Probability it's a bird: 1.0000 (rounded to the nearest four decimal
places). So something pretty extraordinary has happened since late 2015, which is literally
something that has gone from so impossible it's a joke to so easy that I can run it on my laptop
computer in (I don't know how long it was) about two minutes. And so hopefully that gives you
a sense that creating really interesting, you know real working programs with deep learning
is something that… doesn't take a lot of code, didn't take any math, didn't take more than my
laptop computer. It's pretty accessible in fact. So that's really what we're going to be
learning about over the next seven weeks. So where have we got to now
with deep learning? Well it moves so fast, but even in the last few weeks
we've taken it up another notch as a community. You might have seen that something called DALLꞏEꞏ2
has been released which uses deep learning to generate new pictures. And I thought this was an
amazing thing that this guy nick did where he took his friends twitter bios and typed them into the
DALLꞏEꞏ2 input and it generated these pictures. So this guy's… he typed in commitment, sympathetic,
psychedelic, philosophical, and it generated these pictures. So I'll just show you
a few of these. I'll let you read them… I love that. That one's pretty amazing I reckon! actually. I love this. Happy Sisyphus has actually
got a happy rock to move around. So this is like, um, yeah, I don't know. When
I look at these I still get pretty blown away that this is a computer algorithm using nothing
but this text input to generate these arbitrary pictures. In this case of fairly, you know,
complex and creative things. So the guy who made those points out, this is like… he spends
about two minutes or so, you know, creating each of these. Like he tries a few different
prompts and he tries a few different pictures, you know, and so he's given an example here of…
like when he types something into the system… like, here's an example of like 10 different
things he gets back when he puts in “expressive painting of a man shining rays of justice and
transparency, on a blue bird twitter logo.” So it's not just, you know, DALLꞏEꞏ2, to be
clear. There's, you know, a lot of different systems doing something like this now. There's
something called MidJourney, which this twitter account posted: a “female scientist with a
laptop writing code, in a symbolic, meaningful, and vibrant style.” This one here is “an HD
photo of a rare psychedelic pink elephant.” And this one I think is the second one here
(I never know how to actually pronounce this.) This one's pretty cool “a blind bat with big
sunglasses holding a walking stick in his hand.” And so when actual artists, you know, this for
example, this guy said he knows nothing about art, you know he's got no artistic talent, it's just
something you know, he threw together. This guy is an artist who actually writes his own software
based on deep learning and spends, you know, months on building stuff, and as you can see,
you can really take it to the next level. It's been really great actually to see how
a lot of fast.ai alumni with backgrounds as artists have gone on to bring deep learning and
art together, and it's a very exciting direction. And it's not just images to be clear, you know one
of another interesting thing that's popped up in the last couple of weeks is google's pathways
language model which can take any arbitrary english as text, a question, and can create
an answer which not only answers the question but also explains its thinking (whatever it
means for a language model to be thinking.) One of the ones I found pretty amazing was that
it can explain a joke. I'll let you read this… So, this is actually a joke that probably needs
explanations for anybody who's not familiar with TPUs. So it, this model, just took the text
as input and created this text as output. And so you can see, you know, again, deep learning
models are doing things which I think very few, if any of us, would have believed would be maybe
possible to do by computers even in our lifetime This means that there is a lot of
practical and ethical considerations. We will touch on them during this course
but can't possibly hope to do them justice. So I would certainly encourage you to check
out ethics.fast.ai to see our whole data ethics course, taught by my co-founder Dr Rachel Thomas,
which goes into these issues in a lot more detail. All right, so as well as being an AI researcher
at the University of Queensland and fast.ai, I am also a homeschooling primary school teacher
and for that reason I study education a lot. One of the people who I love in education is a
guy named Dylan Williams and he has this great approach in his classrooms of figuring
out how his students are getting along, which is to put a coloured cup on their desk
- green to mean that they're doing fine, yellow cup to mean I'm not quite sure, and a red
cup to mean I have no idea what's going on. Now since most of you are watching this remotely
I can't look at your cups and I don't think anybody bought coloured cups with them today,
so instead we have an online version of this. So what I want you to do is go to
cups.fast.ai/fast - that's cups.fast.ai/fast - and don't do this if you're, like, a fast.ai
expert who's done the course five times - because if you're following along that doesn't really mean
much obviously. This is really for people who are, you know, not already fast.ai experts.
And so click one of these colored buttons. And what I will do, is I will go to the teacher
version and see what buttons you're pressing. All right! So, so far people are feeling we're
not going too fast on the whole. We've got one, nope not one, brief red. Okay! So, hey Nick,
this url, the same thing with teacher on the end, if you can you keep that open as well and let
me know if it suddenly gets covered in red. If you are somebody who's red, I'm not going
to come to you now because there's not enough of you to stop the class. So it's up to you to
ask on the forum or on the youtube live chat, and there's a lot of folks luckily
who will be able to help you, I hope. All right! I wanted to do a big shout out
to Radek. Radeck created cups.fast.ai for me. I said to him last week I need a way
of seeing coloured cups on the internet and he wrote it in one evening. And
I also wanted to shout out that Radek just announced today that he got a job at
Nvidia AI and I wanted to say, you know, that fast.ai alumni around the world very
very frequently, like every day or two, email me to say that they've got their dream job.
Yeah… If you're looking for inspiration on how to get into the field I couldn't recommend nothing…
nothing would be better than checking out Radek's work. And he's actually written a book about his
journey. It's got a lot of tips in particular about how to take advantage of fast.ai -
make the most of these lessons. And so I would certainly… so check that out as well. And
if you're here live he's one of our TAs as well so you can say hello to him afterwards.
He looks exactly like this picture here. So I mentioned I spent a lot of time studying
education both for my home schooling duties and also for my courses, and you'll see
that there's something a bit different, very different, about this course… which is that
we started by training a model. We didn't start by doing an in-depth review of linear algebra
and calculus. That's because two of my favorite writers and researchers on education Paul Lockhart
and David Perkins, and many others talk about how much better people learn when they learn with
a context in place. So the way we learn math at school where we do counting and then adding
and then fractions and then decimals and then blah blah blah and you know, 15 years later
we start doing the really interesting stuff at grad school. That is not the way most people
learn effectively. The way most people learn effectively is from the way we teach sports, for
example, where we show you a whole game of sports. We show you how much fun it is. You go and
start playing sports, simple versions of them, you're not very good right and then you
gradually put more and more pieces together. So that's how we do deep learning. You will go
into as much depth as the most sophisticated, technically detailed classes you'll find - later,
right! But first you'll learn to be very very good at actually building and deploying models. And you
will learn why and how things work as you need, to get to the next level. For those of you that have
spent a lot of time in technical education (like if you've done a phd or something) will find this
deeply uncomfortable because you'll be wanting to understand why everything works from the start.
Just do your best to go along with it. Those of you who haven't will find this very natural.
Oh! And this is Dylan Wiliam, who I mentioned before - the guy who came up with the really cool
cups things. There'll be a lot of tricks that have come out of the educational research literature
scattered through this course. On the whole I won't call them out, they'll just be there, but
maybe from time to time we'll talk about them. All right! So before we start talking about how
we actually built that model and how it works, I guess I should convince you that I'm worth
listening to. I'll try to do that reasonably quickly, because I don't like tooting my own
horn, but I know it's important. So the first thing I mentioned about me, is that me and my
friend Silvain wrote this extremely popular book “Deep Learning for Coders” and that book is what
this course is quite heavily based on. We're not going to be using any material from the book
directly, and you might be surprised by that, but the reason actually is that the educational
research literature shows that people learn things best when they hear the same thing in multiple
different ways. So I want you to read the book and you'll also see the same information presented
in a different way, in these videos. So on,e of the bits of homework after each lesson will
be to read a chapter of the book. A lot of people like the book. Peter Norvig, Director of
Research, loves the book. In fact his ones here “one of the best sources for a programmer to
become proficient in deep learning.” Eric Topple loves the book. Hal Varian Emeritus Professor at
Berkeley, Chief Economist, Google, likes the book. Jerome Pecente who is the head of AI at Facebook
likes the book. A lot of people like the book, so hopefully you'll find that you like this material
as well. I've spent about 30 years of my life working in and around machine learning including
building a number of companies that relied on it. And became the highest ranked competitor in the
world on Kaggle in machine learning competitions. My company Enlitic, which I founded, was the
first company to specialize in deep learning for medicine, and MIT voted it one of the 50 smartest
companies in 2016, just above Facebook and Spacex. I started fast.ai with Rachel Thomas and that
was quite a few years ago now, but it's had a big impact on the world already, including work
we've done with our students, has been globally recognized, such as our win in the DAWNBench
competition which showed how we could train big neural networks faster than anybody in the
world, and cheaper than anybody in the world. And so that was a really big step in 2018,
which actually made a big difference. Google started using our special approaches in
their models. Nvidia started optimizing their stuff using our approaches. So
it made quite a big difference there. I'm the inventor of the ULMFiT algorithm
which according to the Transformers book was one of the two key foundations behind
the modern NLP revolution. This is the paper here. And actually, you know, interesting
point about that, it was actually invented for a fast.ai course. So the first time
it appeared was not actually in the journal. It was actually in lesson four of the course, I think, the 2016
course, if I remember correctly. And, you know, most importantly of course, I've
been teaching this course since Version One. And this is actually, I think, this is the very
first version of it (which even back then was getting hbr's attention) a lot of people have
been watching the course, and it's been, you know, really widely used. Youtube doesn't show likes
anymore, so I have to show you our likes for you. You know it's been amazing to see how many
alumni have gone from this to, you know, to really doing amazing things, you know.
And so for example Andrej Karpathy told me that at Tesla, I think he said, pretty
much everybody who joins Tesla in AI is meant to do this course. I
believe at OpenAI, they told me that all the residents joining there first do
this course. So this, you know this course, is really widely used in industry and research
for people, and they have a lot of success. Okay, so there's a bit of brief information about
why you should hopefully keep going with this. All right so let's get back to what's
happened here. Why are we able to create a bird recognizer in a minute or two?
And why couldn't we do it before? So I'm going to go back to 2012 and in 2012
this was how image recognition was done. This is the computational pathologist - it was
a project done at Stanford. A very successful, very famous project that was looking at the
five-year survival of breast cancer patients by looking at their histopathology image slides.
Now, so this is, like, what I would call a classic machine learning approach. And I spoke to
the senior author of this, Daphne Koller, and I asked her why they didn't use deep
learning and she said “well it just, you know, it wasn't really on the radar at that
point.” So this is like a pre-deep-learning approach. And so the way they did this was they
got a big team of mathematicians and computer scientists and pathologists and so forth to get
together and build these ideas for features, like relationships between epithelial nuclear
neighbors. Thousands and thousands actually they created of features, and each one required a lot
of expertise from a cross-disciplinary group of experts at Stanford. So this project took years,
and a lot of people, and a lot of code, and a lot of math. And then once they had all these features
they then fed them into a machine learning model - in this case, logistic regression, to
predict survival. As I say it's very successful, right, but it's not something that I could create
for you in a minute at the start of a course. The difference with neural networks is neural
networks don't require us to build these features. They build them for us! And so what actually
happened was, in I think it was 2015, Matt Zeiler and Rob Fergus took a trained
neural network and they looked inside it to see what it had learned. So we don't give
it features, we ask it to learn features. So when Zeiler and Zeiler looked inside a
neural network, they looked at the actual weights in the model and they drew a picture of
them. And this was nine of the sets of weights they found. And this set of weights, for example,
finds diagonal edges. This set of weights finds yellow to blue gradients. And this set of weights
finds red to green gradients, and so forth, right. And then down here are examples of some bits of
photos which closely matched, for example, this feature detector. And deep learning, I
mean, is deep because we can then take these features and combine them
to create more advanced features. So these are some layer two features. So there's
a feature, for example, that finds corners. And a feature that finds curves.
And a feature that finds circles. And here are some examples of bits of pictures
that the circle finder found. And so remember with a neural net which is the basic function used
in deep learning, we don't have to hand code any of these or come up with any of these ideas. You
just start with actually a random neural network and your feed it examples and you have it
learn to recognize things, and it turns out that these are the things that it creates for
itself. So you can then combine these features. And when you combine these features it creates a
feature detector, for example, that finds kind of repeating geometric shapes. And it creates a
feature detector, for example, that finds kind of frilly little things, which it looks like is
finding the edges of flowers. And this feature detector here seems to be finding words. And so
the deeper you get the more sophisticated the features it can find are. And so you can imagine
that trying to code these things by hand would be, you know, insanely difficult, and you
wouldn't know even what to encode by hand, right! So what we're going to learn is how neural
networks do this automatically, right, but this is the key difference of why we can now do things
that previously we just didn't even conceive of as possible, because now we don't have to hand code
the features we look for. They can all be learned. it's important to recognize we're going to
be spending some time learning about building image based algorithms and image-based algorithms
are not just for images and in fact this is going to be a general theme. We're going to show you
some foundational techniques but with creativity these foundational techniques can be used very
widely. So for example, an image recognizer can also be used to classify sounds. So
this was an example from one of our students who posted on the forum and said for their
project they would try classifying sounds and so they basically took sounds and
created pictures from their waveforms and then they used an image recognizer on that
and they got a state-of-the-art result by the way. Another of our students on the forum
said that they did something very similar to take time series and turn them into
pictures and then use image classifiers. Another of our students created pictures
from mouse movements from... from users of a computer system. So the clicks became
dots and the movements became lines and the speed of the movement became colors and
then used that to create an image classifier. So you can see with... with some creativity
there's a lot of things you can do with images. There's something else I wanted to point out which
is that as you saw when we trained a real working bird recognizer image model we didn't need lots
of math; there wasn't any. We didn't need lots of data. We had 200 pictures. We didn't need lots of
expensive computers; we just used my laptop. This is generally the case for the vast majority of
deep learning that you'll need in... in real life. There will be some math that pops up during this
course but we will teach it to you as needed or we'll refer you to external resources as...
as needed but it'll just be the little bits that you actually need. You know the myth that
deep learning needs lots of data, I think, is mainly passed along by big companies that want to
sell you computers to store lots of data and to process it. We find that most real world projects
don't need extraordinary amounts of data at all and as you'll see there's
actually a lot of fantastic places you can do state-of-the-art work for
free nowadays which is... which is great news. One of the key reasons for this is because
of something called transfer learning which we'll be learning about a lot during
this course and it's something which very few people are aware of the pay-off. In this course we'll be using Pytorch. For
those of you who are not particularly close to the deep learning world, you might have
heard of Tensorflow and not of Pytorch. You might be surprised to hear that Tensorflow
has been dying in popularity in recent years and Pytorch is actually growing rapidly and in…
in research repositories amongst the top papers, Tensorflow is a tiny minority now
compared to Pytorch. This is also great research that's come out from Ryan
O'Connor. He also discovered that... the majority of people that were doing Tensorflow
in 2018 researchers, the majority have now shifted to Pytorch and I mention this because what people
use in research is a very strong leading indicator of what's going to happen in industry because
this is where you know all the new algorithms are going to come out. this is where all
the papers are going to be written about. it's going to be increasingly difficult to use
Tensorflow. We've been using Pytorch since before it came out, before the initial release because
we knew just from technical fundamentals, it was far better. So this course has
been using Pytorch for a long time. I will say however that Pytorch requires a lot of
hairy code for relatively simple things. This is the code required to implement a particular
optimizer called AdamW in plain Pytorch. I actually copied this code from the Pytorch
repository so as you can see there's a lot of it. This gray bit here is the code required to do the
same thing with fast.ai. fast.ai is a library we built on top of Pytorch. This huge difference
is not because Pytorch is bad. It's because Pytorch is designed to be a strong foundation
to build things on top of, like fast.ai. So... When you use fast.ai - the library, you get
access to all the power of Pytorch as well but you shouldn't be writing all this code if you only
need to write this much code, right? The problem of writing lots of code is that that's lots of
things to make mistakes with, lots of things to, you know, not have best practices in, lots of
things to maintain. In general we've found, particularly with deep learning: less
code is better. Particularly with fastai, the code you don't write is code that we've
basically found kind of best practices for you. So when you use the code that we've
provided for you, you know you'll generally find you get better results. So... so fast.ai has
been a really popular library and it's very widely used in industry, in academia, and in teaching
and as we go through this course we'll be seeing more and more pure Pytorch as we get deeper and
deeper underneath to see exactly how things work. The fast.ai library just won the 2020
best paper award for the paper about it in Information so again you can see it's
a very well regarded library. Okay so… Okay we're still green, that's good. So you may have noticed something interesting,
which is that I'm actually running code in these slides. That's because these slides
are not in PowerPoint. These slides are in a Jupyter notebook. Jupyter notebook is the
environment in which you will be doing most of your computing.
It's a web-based application which is extremely popular and widely used
in industry and in academia and in teaching and it is a very very very powerful way
to to experiment and explore and to build. Nowadays I would say most people at least
most students run jupyter notebooks not on their own computers particularly for
data science but on a cloud server of which there's quite a few and as I
mentioned earlier if you go to course.fast.ai you can see how to use various
different cloud servers. One I'm going to show an example of is Kaggle.
So Kaggle doesn't just have competitions but it also has a cloud notebook server and
I've got quite a few examples there. So let me give you a quick example of
how we use Jupyter notebooks. To... to... to build stuff, to... to experiment, to explore.
So on kaggle, if you start with somebody else's notebook... so why don't you start with this one
Jupyter notebook 101. If it's your own notebook you'll see a button called edit. If it's somebody
else's, that button will say copy and edit. If you use somebody's notebook that
you like, make sure you click the upvote button to encourage them
and to help other people find it before you go ahead and copy and edit. And
once we're in edit mode we can now use this notebook and to use it we can type in any
arbitrary expression in python and click run and the very first time we do that it says
session is starting it's basically launching a virtual computer for us to
run our code. This is all free. In a sense, it's like the world's most powerful
calculator. It's a calculator where you have all of the capabilities of the world's I think
most popular programming language – certainly it and javascript would be the top two – directly
at your disposal. So, python does know how to do one plus one, and so you can see here it spits
out the answer. I hate clicking I always use keyboard shortcuts so instead of clicking this
little arrow, you just press shift enter to do the same thing but as you can see there's
not just calculations here, there's also prose, and so jupyter notebooks are great
for explaining to you the version of yourself in six months time, what on earth you were
doing, or to your co-workers, or the people in the open source community, or the people
you're blogging for etc, and so you just type prose, and as you can see when we create a new
cell, you can create a code cell which is a cell that lets you type calculations, or a markdown
cell which is a cell that lets you create prose. And the prose uses formatting in a little mini
language called “markdown”. There's so many tutorials around I won't explain it to you but
it lets you do things like links and so forth. So I'll let you follow through the tutorial in
your own time because it really explains to you what to do. One thing to point out is that
sometimes you'll see me use cells with an exclamation mark at the start. That's
not Python, that's a bash shell command, okay? so that's what the exclamation mark means. As you can see you can put images into notebooks,
and so the image I popped in here was the one showing that Jupyter won the 2017 software
system award which is pretty much the biggest award there is for this kind of software. Okay,
so that's the basic idea of how we use notebooks. So let's have a look at how we
do our bird or not bird model. One thing I always like to do when I'm
using something like colab or kaggle cloud, cloud platforms that I'm not controlling is, make
sure that I'm using the most recent version of any software. So my first cell here is exclamation
mark pip install minus u , (that means upgrade) q (for quiet) fast.ai. So that makes sure
that we have the latest version of fast.ai, and if you always have that at the start of
your notebooks you're never going to have those awkward forum threads where you say
“why isn't this working?” and somebody says to you “oh you're using an
old version of some software!” So, you'll see here this notebook is the exact thing that I was showing
you at the start of this lesson, So, if you haven't done much python, you might
be surprised about how little code there is here, and so python is a concise but not too
concise language. You'll see that there's less boilerplate than some other languages you might
be familiar with, and I'm also taking advantage of a lot of libraries so fast.ai provides a lot of
convenient things for you. Oh I forgot to import… So, to use a external library we use
import to “import a symbol from a library”. fast.ai has a lot of libraries we
provide. They generally start with “fast something” so for example to make it
easy to download a url, “fastdownload” has download_url(). To make it easy to create a
thumbnail, we have Image.to_thumb() and so forth. So, I always like to view – as I'm building a
model – my data at every step. So that's why I first of all grab one bird, and then I
grab one forest photo, and I look at them to make sure they look reasonable. And once I think, “okay they look
okay”, then I go ahead and download. And, so, you can see fast.ai has
a download_images() where you just provide a list of urls so that's how easy it is,
and it does that in parallel. So it does that, you know, surprisingly quickly. One other fast.ai
thing I'm using here is resize_images(). You generally… you'll find that for computer vision
algorithms you don't need particularly big images, so I'm resizing these to a maximum size length
of 400 because it just… it's actually much faster because gpus are so quick for big images, most
of the time can be taken up just opening it. The neural net itself often takes less time. So
that's another good reason to make them smaller. Okay, so the main thing I wanted to tell you about
was this data block command. So the data block is the key thing that you're going to want to get
familiar with as deep learning practitioners at the start of your journey because the main thing
you're going to be trying to figure out is how do I get this data into my model? Now that might
surprise you. You might be thinking we should be spending all of our time talking about neural
network architectures, and matrix multiplication and gradients and stuff like that, the truth is
very little of that comes up in practice, and the reason is, that at this point the deep learning
community has found a reasonably small number of types of model that work for nearly
all the main applications you'll need; and fast.ai will create the right type of
model for you the vast majority of the time. So all of that stuff about tweaking neural
network architectures and stuff… I mean we'll get to it eventually in this course
but you might be surprised to discover that it almost never comes up. Kind of like if you ever
did like a computer science course or something and they spent all this time on the details of
compilers and operating systems, and then you get to the real world and you never use it again.
So this course is called practical deep learning, and so we're going to focus on the
stuff that is practically important. Okay, so our images have finished downloading, and
two of them were broken so we just deleted them. Another thing you'll note by the way if you're a keen software engineer is, I tend
to use a lot of functional style in my programs I find for kind of the kind of work I do
that a functional style works very well if you're, you know a lot of people in python are less
familiar with, that it's more, maybe comes, more from other things. So yeah that's why you'll
see me using stuff like map and stuff quite a lot. Alright so a data block is the
key thing you need to know about if you're going to know how to
use different kinds of data sets, and so these are all of the things,
basically, that you'll be providing. And so what we did when we designed the data
block was we actually looked and said “okay over hundreds of projects what are all the things that
change from project to project to get the data into the right shape”?, and we realized we could
basically split it down into these five things. So the first thing that we tell fast.ai is “what
kind of input do we have”?, and so then, so there are lots of blocks in fast.ai for different kinds
of inputs so we said “ah, the input is an image!; “what kind of output is there?”, “what
kind of label?.” The output's a category, so that means it's one of
a number of possibilities. So that's enough for fast.ai to know
what kind of model to build for you. So what are the items in this model? what
am I actually going to be looking at to look to train from? this is a function! In fact
you might have noticed if you were looking carefully that we use this function here. It's a function which returns a list of all of
the image files in a path based on extension, so every time it's going to try and find out
what things to train from, it's going to use that function. In this case we'll get a list of
image files. Something we'll talk about shortly is that it's critical that you put aside some data
for testing the accuracy of your model that's called a “validation set.” It's so critical that
fast.ai won't let you train a model without one, so you actually have to tell it
how to create a validation set, how to set aside some data, and in this case
we say “randomly, set aside 20% of the data.” Okay, next question, then you have to tell
fast.ai is “how do we know the correct label of a photo”, how do we know if it's a bird photo
or a forest photo? and this is another function, and this function simply returns the parent
folder of of a path and so in this case we saved our images into either forest or bird. So
that's where the labels are going to come from And then finally, most computer vision architectures need all of
your inputs as you train to be the same size, so “item transforms” are all of the bits
of code that are gonna run on every item, on every image in this case, and we're saying
okay we want you to resize each of them to being 192 by 192 pixels. There's two ways
you can resize: you can either crop out a piece in the middle, or you can squish
it, and so we're saying “squish it”. So that's the data block that's all that you
need, and from there we create an important class called data loaders. Data loaders are
the things that, actually, pytorch iterates through to grab a bunch of your data at a time.
The way it can do it so fast is by using a GPU which is something that can do thousands of things
at the same time and that means it needs thousands of things to do at the same time so a data loader
will feed the training algorithm with a “bunch” of your images at once; in fact we don't call it
a bunch we call it a “batch” or a “mini batch”. And so, when we say show “batch” that's actually
a very specific word in deep learning. It's saying show me an example of a batch of data
that you would be passing into the model, and so you can see showbatch gives you – tells
you – two things: the input which is the picture, and the label, and remember! the label
came by calling that function. So, when you come to building your own models
you'll be wanting to know what kind of splitters are there? what kinds of labeling functions
are there? and so forth what's wrong button you'll be wanting to know what kind of labeling
functions are there and what kind of splitters are there and so forth and so docs.fast.ai is
where you go to get that information. Often the best place to go is to choose the tutorials. So
for example here's a whole data block tutorial, and there's lots and lots of examples, so
hopefully you can start out by finding something that's similar to what you want to do and see
how we did it, but then of course there's also the underlying api information
so here's data blocks! Okay. How are we doing? Still doing good! Alright. So at the end of all this we've got an object
called “dls” that stand for “data loaders” and that contains iterators that pytorch can
run through to grab batches of randomly split out training images to train the model with,
and validation images to test the model with. So now we need a model. The critical concept here in fastai is called
a “learner”. A “learner” is something which combines a model, which is, that is, the actual
neural network function we’ll be training, and the data we use to train it with; and
that's why you have to pass in two things: the data which is the data loaders
object, and a model. And so the model is going to be the actual neural network function
that you want to pass in, and as I said there's a relatively small number that basically
work for the vast majority of things you do. If you pass in just a bare symbol
like this it's going to be one of fast.ai's built-in models but
what's particularly interesting is that we integrate a wonderful library by Ross
Wightman called “timm” (the pytorch image models) which is the largest collection of computer vision
models in the world, and at this point fast.ai is the first and only framework to integrate this.
So you can use any one of the pytorch image models and one of our students Amanamoro was kind
enough to create this fantastic documentation where you can find out all
about the different models. And if we click on here, you can get lots and lots
of information about all the different models that Ross has provided. Having said that,
the model family called “resnet” are probably going to be fine for nearly all
the things you want to do, but it is fun to try different models out so you can type in any
string here to use any one of those other models Okay so if we run that, let's see what happens
okay? So this is interesting… So when I ran this… so, remember on kaggle it's creating a new virtual computer for us, so it doesn't
really have anything ready to go, so when I ran this the first thing it did
was it said “downloading resnet18.pth” What's that? well, the reason we can do this so
fast is, because somebody else has already trained this model to recognize over 1 million
images of over 1 000 different types; something called the “ImageNet” dataset, and
they then made those weights available – those parameters– available on the internet for anybody
to download. By default on fast.ai when you ask for a model, we will download those weights
for you, so that you don't start with a random network that can't do anything. You actually
start with a network that can do an awful lot, and so, then something that fast.ai has,
that's unique, is this fine_tune method, which, what it does is: it takes those pre-trained
weights we downloaded for you, and it adjusts them in a really carefully controlled way
to just teach the model the differences between your data set, and what it was originally
trained for. That's called “fine tuning”; hence the name. So that's why you'll see this
downloading happen first, and so as you can see at the end of it, this is the error right
here after a few seconds it's a 100% accurate So we now have a learner, and this learner
has been, has started with, a pre-trained model that's been fine-tuned for the purpose of
recognizing bird pictures from forest pictures. So you can now call dot predict on it, and dot
predict you pass in an image, and so this is how you would then deploy your model. So in
the code you have whatever it needs to do. So in this particular case this
person had some reason that he needs the app to check whether they're in the national
park, and whether it's a photo of a bird, so at the vet where they need to know if
it's a photo of a bird it would just call this one line of code learn.predict();
and so that's going to return whether it's a bird or not as a string,
whether it's a bird or not as an integer, and the probability that it's a non-bird or a bird;
and so that's why we can print these things out. Okay so that's how we can
create a computer vision model. What about other kinds of models? right? There's
a lot more in the world than just computer vision. A lot more than just image recognition.
Well even within computer vision, there's a lot more than just
image recognition. For example, for example, there's segmentation! So segmentation, maybe the best way to explain
segmentation is to show you the result of this model. Segmentation is where we take photos –
in this case of road scenes – and we color in every pixel according to what it's a, what is
it. So in this case we've got brown as cars, blue is fences I guess?
red is buildings? or brown? And so on the left here some photos that
some somebody has already gone through, and classified every pixel of, every one of these
images according to what that pixel is a pixel of, and then on the right is what our
model is guessing, and as you can see it's getting a lot of the pixels correct
and some of them it’s getting wrong. It's actually amazing how many is getting
correct because this particular model I trained in about 20 seconds using a tiny, tiny,
tiny, amount of data. So you know, again, like, you would think this would be a particularly
challenging problem to solve, but it took about 20 seconds of training to solve it, not amazingly
well, but pretty well. If I trained it for another two minutes it'd probably be close to perfect. So
this is called “segmentation”. Now you'll see that there's very, very, little data required, and …
sorry, very little CODE required, and the steps are actually going to look quite familiar. In fact
in this case we're using an even simpler approach. Earlier on, we used “data blocks”. Data blocks
are a kind of intermediate level very … flexible approach that you can take to handling almost any
kind of data, but for the kinds of data that occur a lot, you can use these special data loaders
classes which kind of lets you use even less code. So, in this case, to create
data loaders for segmentation, you can just say: “okay I'm going to pass you
in a function for labeling”; and you can see here it's got pretty similar things that we pass
in, to what we passed in for data blocks before. So our file names is get_image_files()
again, and then our label function is something that grabs this path, and the codes.
So the labels for further segmentation, sorry, the codes. So, like, “what does each code
mean?” is going to be this text file, but you can see the basic information we're providing
is very very similar regardless of whether we're doing segmentation or object recognition; and
then the next steps are pretty much the same. We create a learner for segmentation. We create, so, we've got a UNet learner which we'll learn
about later, and then again, we call fine_tune() so that is it! and that's how
we create a segmentation model! What about stepping away from computer vision?
So perhaps the most widely used kind of model used in industry is tabular analysis. So, taking things like spreadsheets and database
tables, and trying to predict columns of those. So, in tabular analysis, it really looks
very similar to what we've seen already, We grabbed some data and you'll see when I call
this untar_data(), this is the thing in fast.ai that downloads some data and decompresses it for
you and there's a whole lot of urls provided by fast.ai for all the, kind of common data sets that
you might want to use; you know all the ones that are in the book? or lots of data sets that are
kind of widely used in learning and research? So, it makes life nice and easy for you. So
again, we're going to create data loaders, but this time it's tabular data loaders, but we
provide pretty similar kind of information to what we have before. Couple of new things we have to
tell it. “Which of the columns are categorical?” so they can only take one of a few
values, and “which ones are continuous” so they can take, basically, any real number,
and then again, we can use the exact same show_batch() that we've seen before, to
see the data, and so, fast.ai uses a lot of something called “type dispatch” which is a
system that's particularly popular in in a language called Julia, to basically,
automatically, do the right thing for your data, regardless of what kind of data it
is. So, if you call show_batch() on something you should go get back something “useful” regardless
of what kind of information you provided. So for a table it shows you on the information in
that table. This particular data set is a data set of whether people have less than fifty thousand
dollars or more than fifty thousand dollars in salary for different districts based on
demographic information in each district. So, to build a model for that
data loader, we do as always “something” underscore “learner”.
In this case it's a tabular learner. Now this time we don't say: “fine_tune()”;
we say “fit”. Specifically, fit_one_cycle(). That's because for tabular models, there's
not generally going to be a pre-trained model that already does something like what you want,
because every table of data is very different, whereas pictures often have a similar theme,
you know? They're all pictures. They all have the same kind of “general idea” of what
pictures are. So that's why it generally doesn't make too much sense to fine-tune
a tabular model. So instead, you just fit. So there's one difference there.
I'll show another example. Okay, so, collaborative filtering. “Collaborative filtering” is the basis
of most recommendation systems today. It's a system where we basically take data set
that says: which users liked which products, or which users used which products, and then we
use that to guess what other products those users might like based on finding similar users, and
what those similar users like. The interesting thing about collaborative filtering is that when
we say “similar users,” we're not referring to similar demographically, but similar in the
sense of people who liked the same kinds of products. So, for example, if you use any of
the music systems like Spotify, or Apple music, or whatever, it'll ask you first, like, “what's
a few pieces of music you like,” and you tell it, and then it says “okay well maybe let's start
playing this music for you,” and that's how it works. It uses collaborative filtering. So
we can create a collaborative filtering data loaders in exactly the same way that we're used
to, by downloading and decompressing some data, create our collab data loaders. In
this case we can just save from csv, and pass in a csv, and this is what collaborative
filtering data looks like. It's going to have, generally speaking, a user id, some kind of
product id; in this case, a movie, and a rating. So, in this case this user gave
this movie a rating of 3.5 out of 5, and so, again, you can see show_batch,
right? So use show batch. You should get back some useful visualization of your data
regardless of what kind of data it is, and so again, we create a “learner.” This time the “collaborative filtering”
learner, and you pass in your data. In this case we give it one extra piece
of information, which is – because this is not predicting a category, but it's
predicting a real number – we tell it: “what's the possible range.”
The actual range is one to five, but for reasons you'll learn about later, it's
a good idea to actually go from a little bit lower than the possible minimum, to a little
bit higher. That's why I say 0.5 to 5.5, and then fine-tune. Now, again, you know we don't
really need to fine-tune here because there's not really such a thing as a pre-trained collaborative
filtering model. We could just say “fit” or “fit_one_cycle”, but actually fine_tune() works
fine as well. So, after we train it for a while, this here is the “mean squared error”, so it's
basically, that: “on average how far off are we for the validation set”.
And you can see as we train, and it's literally so fast (it's
less than a second each epoch) that error goes down and down, and for any kind of
fast.ai model you can always call show_results() , and get something sensible. So in this case it's
going to show a few examples of users and movies. Here's the actual rating that user gave that movie
and here's the rating that the model predicted. Okay, so, apparently a lot of people
on the forum are asking how I'm turning this notebook into a presentation? So, I'll be delighted to show you, because I'm
very pleased that these people made this thing for free for us to use. It's called “Rise”,
and all I do is, it's a notebook extension, and in your notebook it gives you an extra little
thing on the side where you say which things are slides, or which things are fragments, and a
fragment just being… so, this is a slide, that's a fragment. So if I do that you'll see it starts
with a slide and then the fragment gets added in. Yeah, that's about all there is to it!
Actually it's pretty great, and it's very well documented. You know I'll just mention,
like what do I make with Jupyter notebooks: this entire book was written
entirely in Jupyter notebooks. Here are the notebooks. So if you go to the
fastai fastbook repo… if you go to the fastai fastbook repo, you can read the whole book, and
because it's all in notebooks, every time we say: “here's how you create this plot,” or “here's
how you train this model,” you can actually create the plot, or you can actually train
the model, because it's all notebooks. The entire fast.ai library is
actually written in notebooks. So you might be surprised to discover
that if you go to fast.ai fast.ai that the source code for the
entire library is notebooks, and so the nice thing about this
is that, you know, the source code for the fast.ai library has actual pictures of
the actual things that we're building for example. What else have we done with notebooks? oh! blogging! I love blogging with notebooks, because
when I want to explain something I just write the code, and you can just
see the outputs, and it all just works. Another thing you might be surprised by, is
all of our tests and continuous integration are also all in notebooks. So every
time we change one of our notebooks… every time we change one of our notebooks,
hundreds of tests get run automatically, in parallel, and if there's any issues
we will find out about it. So, yeah, notebooks are great! and rise is… is a
really nice way to do slides in notebooks. Alright. So, what can deep learning do at present? We're still scratching this the tip of the iceberg
even though it's a pretty well hyped you know heavily marketed technology this pro… you
know when we started in 2014 or so, you know, not many people were talking about deep learning,
and really there was no accessible way to get started with it. There were no pre-trained
models you could download, you know. There was just starting to appear some of the first
open source software that would run on gpus, but yeah, I mean, but despite the fact that
today there's a lot of people talking about deep learning, we're just scratching the surface.
Every time pretty much somebody says to me I work in domain x, and I thought I might
try deep learning out to see if it can help, and I see them a few months later,
and I say “how did it go?” they nearly always say: “well we just broke the
state-of-the-art results in our field.” So, you know, when I say these are things that
it's currently state of the art for – these are kind of the ones that people have tried so far
– but still, most things haven't been tried. So, in NLP, deep learning is the state of the art method in all these kinds of things and a
lot more; computer vision, medicine, biology recommendation systems, playing games, robotics…
I mean it's just… I've tried… I've tried elsewhere to make bigger lists, and I just end up with
pages and pages and pages! so yeah it's… it's generally speaking you know, if it's something
that a human can do reasonably quickly, like look at a go board and
decide if it looks like a good go board or not, even if it needs to be an
ex even… if it needs to be an expert human, then that's probably something that
deep learning will be pretty good at. If it's something that takes a lot of logical
thought processes over an extended period of time, you know, particularly if it's not based
on much data, maybe not. You know, like, “who's going to win the next election”, or
something like that. That'd be kind of broadly how I try to decide: is your thing useful for
deep… you know, good for deep learning or not. You know it's been a long
time to get to this point. Yes, deep learning is incredibly powerful now,
but it's taken decades of work. This was the first neural network. Remember, neural networks are the
basis of deep learning. So this was back in 1957. The basic ideas have not changed much at all,
but you know we do have things like gpus now, and solid-state drives, and stuff like that,
and of course much more data, just is available, now, but this has been decades of really hard
work by a lot of people to get to this point. So let's kind of take a step back, and
talk about, like, what's what's going on in these models, and I'm going to describe
the basic idea of machine learning largely as it was described by Arthur
Samuel in the late 50s when it was invented, and I'm going to kind of do it with
these graphs; oh which, by the way, you might find fun… these graphs are themselves… created with Jupyter notebooks. So these are
graphviz descriptions that are going to get turned into these, so there's a little
sneak peek behind the scenes for you? So, let's start with kind of a graph of, like,
well, what does a normal program look like? right? so, in the pre deep learning, and machine learning
days, well, you know, you'd have… you still have inputs, and you still have results. Right? and
then you code a program in the middle, which is, you know, a bunch of conditionals, and loops,
and setting variables, and blah blah blah. Okay. A “machine learning model”
doesn't look that different… but… The program has been replaced with something
called “a model,” and we don't just have inputs now, we now also have weights, which are also
called “parameters,” and the key thing is this: the model is not any more a bunch of
conditionals, and loops, and things. It's a mathematical function. In the case of a neural network, it's a
mathematical function that takes the inputs, multiplies them together by the weights –
by one set of weights – and adds them up; and then it does that again for a
second set of weights and adds them up, does it again for a third set of
weights, and adds them up and so forth. It then takes all the negative
numbers, and replaces them with zeros, and then it takes those as inputs to
the next layer. It does the same thing; multiplies them a bunch of times, and adds
them up, and it does that a few times, and that's called “a neural
network.” Now, the model therefore, is not going to do anything useful,
unless these weights are very carefully chosen. And, so the way it works is, that we actually
start out with these weights as being random. So initially, this thing
doesn't do anything useful at all! So, what we do –- the way arthur samuel
described it back in the late 50s (the inventor of machine learning) – is,
he said: “okay, let's take the inputs, and the weights, put them through our
model.” He wasn't talking particularly about neural networks. He's just
like… whatever model you like… get the results, and then let's decide how
“good” they are, right? So, if for example, we're trying to decide: “is this a picture of
a bird?” and the model said – which initially is random – says “this isn't a bird,” and actually
it IS a bird, we would say: “oh you're wrong!” So we then calculate “the loss.” So, the loss is
a number that says how good were the results. So that's all pretty straightforward, you
know. We could, for example, say oh what's the accuracy? We could look at 100 photos and
say which percentage of them did it get right. No worries. Now the critical step is this
arrow. We need a way of updating the weights that is coming up with a new set of weights
that are a bit better than the previous set. Okay. And by a bit better we
mean it should make the loss get a little bit better. So we've got this
number that says how good is our model and initially it's terrible, right? It's random. We
need some mechanism of making a little bit better. If we can just do that one thing then we just need to iterate this a few times.
Because each time we put in some more inputs and... and put in our weights and get our loss
and use it to make it a little bit better, then if we make it a little bit better enough
times, eventually it's going to get good. Assuming that our model is flexible enough to
represent the thing we want to do. Now remember what I told you earlier about what a neural
network is: which is basically multiplying things together and adding them up and replacing the
negatives with zeros and you do that a few times, that is provably an infinitely flexible function. So it actually turns out that that incredibly
simple sequence of steps, if you repeat it a few times, you do enough of them, can solve any
computable function. And something like generate an artwork based off somebody's twitter bio
is an example of a computable function, right? Or translate English to Chinese is an example
of a computable function. So they're not the kinds of normal functions you do in year eight
math right but they are computable functions. and so therefore if we can just create this step
then and use the neural network as our model then we're good to go. In theory we can solve
anything given enough time and enough data and so that's exactly what we do. And so once we've finished that training procedure we don't need the loss anymore and even the
weights themselves, we can integrate them, kind of into the model, right? We finish
changing them so we can just say that's now fixed and so once we've done that we now
have something which takes inputs, puts them through a model and gives us results. It
looks exactly like our original idea of a program and that's why we can do what I described earlier,
that is once we've got that learn.predict for our bird recognizer we can insert it into any piece
of computer code, right? Once we've got a trained model it's just another piece of code we can
call with some inputs and get some outputs. Deploying machine learning models in practice
can come with a lot of, you know, little tricky details but the basic idea in your code is that
you're just going to have a line of code that says learn.predict and then you just fit it in with
all the rest of your code in the usual way. And this is why: because a trained model is
just another thing that maps inputs to results. Okay. All right. so as we come to
wrap up this first lesson, for those of you that are already familiar with
notebooks and Python, there's... you know... this has got to be pretty easy for you. You're just
going to be using some stuff that you're already familiar with and some slightly new libraries.
For those of you who are not familiar with Python, you know, you... you're biting into a big thing
here. There's obviously a lot you're going to have to learn and to be clear I'm... I'm not
going to be teaching Python in this course but we do have links to great python resources in
the forum. So check out... check out that thread. Regardless of where you're at, the most important
thing is to experiment and so experimenting could be as simple as just running those Kaggle
notebooks that I've shown you just to see them run. You could try changing things a little bit.
I'd really love you to try doing the... the bird or forest exercise but come up with something
else. Maybe try to use three or four categories rather than two, you know. Have a think about
something that you think would be fun to try. Depending on where you're at, you know, push
yourself a little bit but not too much. So make sure you get something finished before the next
lesson. Most importantly, read chapter one of the book. It's got much the same stuff that we've seen
today but presented in a slightly different way. And then come back to the forums and present
what you've done in the share your work here thread. After the first time we
did this in year one of the course we got over a thousand replies and of
those replies it's amazing how many of them have ended up turning into new
startups, scientific papers, job offers. It's been really cool to watch people's journeys
and some of them are just plain fun, you know. So this person classified different types of
Trinidad and Tobago people. So you know, people do stuff based on where they
live and what their interests are. I don't know if this person is particularly
interested in zucchini and cucumber but they made a zucchini and cucumber classifier. I
thought this was a really interesting one: classifying satellite imagery into what city
it's probably a picture of. Amazingly accurate actually. 85 percent with 110 classes. Panama
city bus classifier. Batik cloth classifier. This one, you know, very practically important,
you know, recognizing the state of buildings. We've had quite a few students actually move into
disaster resilience based on satellite imagery using exactly this kind of work. We've already
actually seen this example. Ethan Sutin the... the sound classifier and I mentioned it was
state of the art. He actually checked up the dataset's website and found that he beat the state
of the art for that. Alena Harley did tumor-normal sequencing. So she was at Human Longevity
International. So she actually did three different really interesting pieces of cancer work
during that first course if I remember correctly. And I showed you this picture before. What I
didn't mention is actually this... this student Gleb was a software developer at Splunk, a big
NASDAQ listed company. And this student project he did turned into a new patented product at
Splunk and a big blog post and the whole thing turned out to be really cool. Basically something
to identify fraudsters using image recognition with these pictures we discussed. One of our
students built this startup called Envision. Anyway there's been lots and lots of examples. So
all of this is to say, you know, have... have a go at you know, starting something. Create something
you think would be fun or or interesting and share it in the forum. If you're a total beginner
with Python, you know, then start... start with something simple. But I think you'll find
people very encouraging and if you've done this a few times before then try to push yourself
a little bit further. And don't forget to look at the quiz questions at the end of the book
and see if you can answer them all correctly. All right, thanks everybody, so much for coming.
Okay. Thanks so much for coming everybody. Bye.