So hello, and welcome to Lesson 3 of Practical
Deep Learning for Coders. We were looking at getting our model into production last
week, and so we're going to finish off that today, and then we're going to start to look
behind the scenes at what actually goes on when we train a neural network. We're going
to look at the math of what's going on, and we're going to learn about SGD and important
stuff like. The order is slightly different to the book: in the book there's a part in
the book which says like “Hey, you can either go to lesson 4 or lesson 3 now and then go
back to the other one afterwards” so we're doing lesson 4 and then lesson 3. Chapter
4 and then Chapter 3, I should say. You can choose it whichever way you're interested
in. Chapter 4 is the more technical chapter about the foundations of how deep learning
really works. Whereas Chapter 3 is all about ethics, and so with the lessons we'll do that
next week. So we're looking at 02 production notebook, and we've got to look at the fastbook
version (the one with…in fact everything I'm looking at today will be in the fastbook
version). And remember last week we had a look at our bears, and we created this dataloaders
object by using the datablock API which i hope everybody's had a chance to experiment
with this week--if you haven't, now's a good time to do it! We kind of skipped over one
of the lines a little, which is this itm_tfms. So what this is doing here, when we said “Resize”:
the images we downloaded from the internet were lots of different sizes and lots of different
aspect ratios some are tall and some are wide some are square and some are big some are
small. When you say resize for an item transform it means each item (so an item in this case
is one image) is going to be resized to 128x128 by squishing it or stretching it. And so we
had a look at, you can always say show_batch to see a few examples, and this is what they
look like. Squishing and stretching isn't the only way that we can resize remember we
have to make everything into a square before we kind of get it into our model. By the time
it gets to our model everything has to be the same size in each mini batch, so that's
why... making it a square is not the only way to do that, but it's the easiest way and
it’s by far the most common way. Another way to do this is we can create another datablock
object, and we can make a datablock object that's an identical copy of an existing datablock
object where we can then change just some pieces and we can do that by calling the “new”
method which is super handy. So let's create another datablock object, and this time with
different item_transforms where we resize using the “Squish” method. We have a question:
what are the advantages of having square images versus rectangular ones? That's a great question.
Really, it’s simplicity. If you know all of your images are rectangular, of a particular
aspect ratio to start with, you may as well just keep them that way. But if you've got
some which are tall and some which are wide, making them all square is kind of the easiest.
Otherwise you would have to organize them such as all of the tall ones ended up in a
mini batch nor the wide ones ended up in a mini batch, and then you'd have to then figure
out what the best aspect ratio for each mini batch is, and we actually have some research
that does that in fastai2 ( but it's still a bit clunky). I should mention... Okay, I
just lied to you--the default is not actually to squish or stretch: the default (I should
have said, sorry) the default when we say resize is actually just to grab the center.
So actually all we're doing is we’re grabbing the center of each image. So if we want to
squish or stretch you can add the ResizeMethod.Squish argument to Resize and you can now see that
this black bear is now looking much thinner, but we have got the kind of leaves that are
around on each side for instance. another question when you use the dls dot
new method what can and cannot be changed -- is it just the transforms? So it's not
dls dot new it's bears dot new, right? So we're not creating a new data loaders object;
we're creating a new datablock object. I don't remember off the top of my head so check the
documentation and I'm sure somebody can pop the answer into the into the forum. So you
can see when we use dot squish that this grizzly bear has got pretty kind of wide and weird-looking
and this black bear has got pretty weird and thin-looking and it's easiest kind of to see
what's going on if we use ResizeMethod dot pad, and what dot pad does as you can see
is it just add some black bars around each side. So you can see the grizzly bear was
tall so then when we we stretched (squishing and stretching are opposites of each other)
so when we stretched it it ended up wide and the black bear was originally a wide rectangle
so it ended up looking kind of thin. You don’t have to user to use zeros. Zeros means pad
it with black. You can also say like reflect to kind of have the pixels will kind of look
a bit better that way if you use reflect. All of these different methods have their
own problems; the the pad method is kind of the cleanest you end up with the correct size,
you end up with all of the pixels, but you also end up with wasted pixels so you kind
of end up with wasted computation. The squish method is the most efficient because you get
all of the information you know and and nothing's kind of wasted, but on the downside your neural
nets going to have to learn to kind of like recognize when something's being squished
or stretched. And in some cases it might -- it wouldn't even know, so if there's two objects
you're trying to recognize, one of which tends to be thin and one of which tends to be thick
-- in other words they're the same -- they could actually be impossible to distinguish.
And then the default cropping approach actually removes some information so in this case,
you know, this grizzly bear here we actually lost a lot of its legs, so if figuring it
out, what kind of bear it was required looking at its feet, well, we don't have its feet
anymore. So they all have downsides. So there's something else that you can do, a different
approach, which is instead of to say resize, you can say RandomResizedCrop. And actually
this is the most common approach and what random resize crop does is each time it actually
grabs a different part of the image and kind of zooms into it, right? So these, this is
all the same image and we're just grabbing a batch of four different versions of it and
you can see some are kind of, you know, they're all squished in different ways and we've kind
of selected different subsets and so forth. Now this kind of seems worse than any of the
previous approaches because I'm losing information. Like this one here -- I've actually lost a
whole lot of its, of its back, right, but the cool thing about this is that remember
we want to avoid overfitting. And when you see a different part of the animal each time,
it's much less likely to overfit because you're not seeing the same image on each epoch that
you go around. That make sense? So, so this random resized crop approach is actually super
popular, and so min_scale 0.3 means we're going to pick at least 30% of the pixels,
of kind of the original size each time, and then we’re going to like zoom in to that
that square. So this idea of doing something so that each time the model sees the image
it looks a bit different to last time is called data augmentation. And this is one type of
data augmentation. It's probably the most common, but there are others and one of the
best ways to do data augmentation is to use this aug_transforms function. And what aug_transforms
does is it actually returns a list of different augmentations. And so there are augmentations
which change contrast, which change brightness, which warps a perspective so you can see in
this one here it looks like this bit’s much closer to you and this moves much away from
you because it's kind of been perspective warped; it rotates them (see this one's actually
being rotated), this one's been made really dark, right? These are batch transforms not item transforms.
The difference is that item transforms happen one image at a time and so the thing that
resizes them all to the same size that has to be an item transform. Pop it all into a
mini batch, put it on the GPU and then a batch transform happens to a whole mini batch at
a time. And by putting these as batch transforms the augmentation happens super fast because
it happens on the GPU. And I don't know if there's any other libraries as we speak which
allow you to write your own GPU accelerated transformations that run on the GPU in this
way. So this is a super handy thing in first AI 2. So you can check out the documentation
or aug transforms and when you do you'll find the documentation for all of the underlying
transforms that it basically wraps. Right so you can see if I shift tab, I don't remember
if i have shown you this trick before - if you go inside the parentheses of a function
and hit shift tab a few times it'll pop open a list of all of the arguments and so you
can basically see you can say like oh can I sometimes flip it left right, can I sometimes
flip it up down, what's the maximum amount I can rotate, zoom, change the lighting, warp
the perspective and so forth. How can we add different augmentations for train and validation
sets? So the cool thing is that automatically fastai will avoid doing data augmentation
on the validation set. So all of these aug transforms will only be applied to the training
set with the exception of RandomResizedCrop. RandomResizedCrop has a different behavior
or each, the behavior for the training set is what we just saw which is to randomly pick
a subset and zoom into it and the behavior for the validation set is just to grab the
center, the largest center square that it can. You can write your own transformations,
they're just Python, they are just standard Pytorch code. And by default it will only
be applied to the training set. If you want to do something fancy likeRandomResizedCrop
where you actually have different things being applied to each, you should come back to the
next course to find out how to do that or read the documentation. It's not rocket science
but it's that's something most people need to do. Um okay so last time we did bears.new
with a RandomResizedCrop, mean scale of 0.5, we added some transforms and we went ahead
and trained. Actually since last week I’ve rerun this notebook and it's on a different
computer and I've got different images so it's not all exactly the same but I still
got a good confusion matrix. Of the black bears 37 were classified correctly 2 were
grizzly's and 1 was a teddy. Now plot top losses is interesting you can see in this
case there's some clearly kind of odd things going on this is not a bear at all this looks
like it's a drawing of a bear. Which it's decided, is predicted as a Teddy but it's
meant to be a drawing of a black bear. I can certainly see the confusion. You can see how
some parts that have been cut off we’ll talk about how to deal with that later. Now
one of the interesting things is that we didn't really do much data cleaning at all before
we built this model the only data cleaning we did was just to validate that each image
can be opened, there was that verify images call. And the reason for that is it's actually
much easier normally to clean your data after you create a model and I'll show you how.
We've got this thing called image classifier cleaner where you can pick a category right
and training set or validation set and then what it will do is it will then list all of
the images in that set and it will pick the ones which is the least confident about, which
is the most likely to be wrong, where the loss is the worst to be more precise. And
so this this is a great way to look through your data and find problems. So in this case
the first one is not a teddy or a brown bear or a black bear it's a puppy dog, right. So
this is a great cleaner because what I can do is I can now click delete here, this one
here looks a bit like an Ewok rather than a teddy I'm not sure what do you think Rachel
is it an Ewok ? I'm going to call it an Ewok ok and so you
can kind of go through okay that's definitely not a teddy and so you can either say like
oh that's wrong it's actually a grizzly bear or it's wrong it's a black bear or I should
delete it or by default is keep it right and you can kind of keep going through until you
think like okay they all seem to be fine maybe that one's not and kind of once you get to
the point where all seems to be fine you can kind of say okay probably all the rest to
fine too because they all have lower losses so they all fit the kind of the mode of a
teddy and so then I can run this code here where I just go through cleaner.delete so
that's all the things which I've selected delete for and unlink them so unlink is just
another way of saying delete a file that's the Python name and then go through all the
ones that we said change and we can actually move them to the correct directory. If you
haven't seen this before you might be surprised that we've kind of created our own little
GUI inside Jupiter notebook. Yeah you can do this, and we built this with less than
a screen of code, you can check out the source code in the past AI notebooks so this is a
great time to remind you that this is a great time to remind you that fast.ai is built with
notebooks and so if you go to the fast.ai repo and clone it and then go to NBS you'll
find all of the code of fast.ai written as notebooks and they've got a lot of prose and
examples and tests and so forth. So the best place to learn about how this is implemented
is to look at the notebooks rather than looking at the module code. Okay, by the way sometimes
you'll see like weird little comments like this. These weird little comments are part
of a development environment for Jupiter notebook we use called nbdev which we built so Sylvain
and I built this thing to make it much easier for us to kind of create books and websites
and libraries in Jupiter notebooks so this particular one here hide means when this is
turned into a book or into documentation don't show this cell and the reason for that is
because you can see I've actually got it in the text right but I thought when you're actually
running it it would be nice to have it sitting here waiting for you to run directly so that's
why it's shown in the notebook but not in the in the book has shown differently. And
you’ll also see things like s: with a quote in the book that would end up saying Sylvain
says and then what he says so there's kind of little bits and pieces in the notebooks
that just look a little bit odd and that's because it's designed that way in order to
show, in order to create stuff in them. Right, so, then last week we saw how you can export
that to a pickle file that contains all the information from the model, and then on the
server where you're going to actually do your inference, you can then load that save file
and you'll get back a learner that you can call predict on. So predict, perhaps the most
interesting part of predict is the third thing that it returns which is a tensor, in this
case containing three numbers. But the three numbers there's three of them because we have
three classes, teddy bear, grizzly bear and black bear, all right? And so this doesn't
make any sense until you know what the order of the classes is, kind of in your data loaders.
And you can ask the data loaders what the order is by asking for its vocab. So a vocab
in fast.ai is a really common concept it's basically any time that you've got like a
mapping from numbers to strings or discrete levels the mapping is always taught in the
vocab so here this shows us that the activation of black bear is 1-e6, the activation for
grizzly is 1 and the activation for teddy is 1e-6, so very very confident that this
particular one it was a grizzly not surprisingly this was something called grizzly.JPEG Umm so you need to know this... this mapping
in order to display the correct thing, but of course the data loaders object already
knows that mapping, and it's all, the vocab, and it's stored in with the loader, so that's
how it knows to say grizzly automatically. So the first thing it gives you is the human
readable string that you'd want to display. So this is kind of nice that with fast AI
2 you, you save this object, which has everything you need for inference. It's got all the,
you know, information about normalization, about any kind of transformation steps, about
what the vocab is, so it can display everything correctly. Right. So now we want to deploy
this as an app. Now if you've done some web programming before then all you need to know
is that this line of code, and this line of code... So this is the line of codes you would
call once when your application starts up, and then this is the line of code you would
call every time you want to do inference. And there's also a batch version of it which
you can look up if you're interested this is just a ‘one at a time’. So there's
nothing special if you're already a web programmer or have access to a web programmer. These
are you know... You just have to stick these two lines of code somewhere and the three
things you get back whether, the human readable string if you're doing categorization, the
index of that which in this case is one, is grizzly, and the probability of each class.
One of the things we really wanted to do in this course though, is not assume that everybody
is a web developer. Most data scientists aren't, but gee wouldn't it be great if all data scientists
could at least, like, prototype an application to show off the thing they're working on.
And so we've... Trying to kind of curate an approach, which none of its stuff we've built,
it's really as curated, which shows how you can create a GUI and create a complete application
in Jupyter notebook. So the key pieces of technology we use to do this are, ipython
widgets which is always called iPy widgets, and Voila. iPy widgets which we import by
default as widgets, and that's also what they use in their own documentation, as GUI widgets.
For example a file upload button. So if I create this file upload button and then display
it, I see, and we saw this in the last lesson as well or maybe lesson one, an actual clickable
button. So I can go ahead and click it, and it says now, OK you've selected one thing.
So how do I use that? Well these... Well these widgets have all kinds of methods and properties.
And the upload button has a data property, which is an array, containing all of the images
you uploaded. So you can pass that to PIL image dot create and so dot create is kind
of the standard factory method we use in fast AI to create items and PIL image dot create
is smart enough to be able to create an item from all kinds of different things and one
of the things that it can create it from is a binary blob, which is what a file upload
contains. So then we can display it and there's our teddy. Right? So you can see how, you
know, cells of Jupyter notebook can refer to other cells that were created, that were...
Kind of have GUI created data in them. So let's hide that teddy away for a moment and
the next thing to know about is that there's a kind of widget called output and an output
widget is... It's basically something that you can fill in later. Right? So if I delete
actually this part here. So I've now got an output widget. Yeah, actually let’s do it
this way around. And you can't see the output widget even though I said please display it,
because nothing is output. So then in the next cell I can say with that output placeholder
display a thumbnail of the image and you'll see that the display will not appear here.
It appears back here! Right? Because that's how... That's where the placeholder was. So
let's run that again to clear out that placeholder. So we can create another kind of placeholder
which is a label. The label is kind of a something where you can put text in it. You can give
it a value like, I don't know, please choose an image. okay so we've now got a label containing please
choose an image. Let's create another button to do a classification, now this is not a
file upload button it's just a general button so this button doesn't do anything, right,
it doesn't do anything until we attach an event handler to it. An event handler is a
callback, we'll be learning all about callbacks in this course, if you've ever done any GUI
programming before or even web programming you'll be familiar with the idea that you
write a function which is the thing you want to be called when the button is clicked on
and then somehow you tell your framework that this is the on click event. So here I go here's
my button run, I say the on click event, the button run is, we call this code and this
code is going to do all the stuff we just saw. I create an image from the upload, it's
going to clear the output, display the image, call predict and then replace the label with
a prediction. There it all is. Now so that hasn't done anything but I can now go back
to this classify button which now has an event handler attached to it, so watch this: click,
boom, and look that's been filled in, thats been filled in. Right, in case you missed
it let's run this again, clear everything out. Okay everything's gone, this is please
choose an image, there's nothing here, I click classify, bop, bop. Right so it's kind of
amazing how our notebook has suddenly turned into this interactive prototyping playground
building applications and so once all this works we can dump it all together and so the
easiest way to dump things together is to create a V box. So a V box is a vertical box
and it's just it's just something that you put widgets in and so in this case we're going
to put the following widgets in, and so in this case we going to put the following widgets;
a label that says “select your bear”, then an upload button, a run button an output
placeholder and a label for predictions. But let's run these again just to clear everything
out so that we're not cheating and let's create our V box. So as you can see it's just got
all the all the pieces, right, we've got...oh I accidentally ran the thing that displayed
the bear, let's get rid of that. Okay so there it is so now I can click upload, I can choose
my bear and then I can click classify and notice this is exactly, that this is, this
is the same buttons as these buttons, they're like two places we're viewing the same button,
which is kind of a wild idea. So if I click classify it's going to change this label and
this label because they're actually both references to the same label; look there we are. So this
is our app right and so this is actually how I built that image cleaner GUI, is just using
these exact things and I built that image cleaner GUI cell-by-cell in a notebook just
like this and so you get this kind of interactive experimental framework for building a GUI
so if you're a data scientist who's never done GUI stuff before this is a great time
to get started because now you can you can make actual programs. Now of course an actual
program running inside a notebook is kind of cool but what we really want is this program
to run in a place anybody can run it that's where Voila comes in. So Voila and needs to
be installed, so you can just run these lines or install it, it's listed in the prose and
what voila does is it takes a notebook and doesn't display anything except for the markdown,
the ipython widgets and the outputs, right, so all the code cells disappear and it doesn't
give the person looking at that page the ability to run their own code, they can only interact
with the widgets, right, so what I did was a copied and pasted that code from the notebook
into a separate notebook which only has those lines of code, right, so this is just the
same lines of code that we saw before and so this is a notebook, it's just a normal
notebook, and then I installed Voila and then when you do that if you navigate to this notebook
but you replace “notebooks” up here with Voila, it actually displays not the notebook
but just as I said the markdown and the widgets. So here I've got bear classifier and I can
click upload, let's do a grizzly bear this time, and this is a slightly different version
I actually made this so there's no classify button I thought it would be a bit more fancy
to make it so when you click upload it just runs everything, but as you can see there
it all is, right, it's all working. So this is the world's simplest prototype but it's,
it's a proof-of-concept right so you can add widgets with dropdowns and sliders and charts
and you know, everything that you can have in you know, an angular app or a react app
or whatever and in fact there's, there's even stuff which lets you use for example the whole
Vue JS framework if you know that, it's a very popular JavaScript framework, the whole
Vue JS framework you can actually use it in widgets and Voila. So now we want to get it
so that this this app can be run by someone out there in the world. So the voila documentation
shows a few ways to do that, but perhaps the easiest one is to use a system called Binder.
So Binder is at mybinder.org and all you do is you paste in your github repository name
here, right, and this is all in the book, so paste in your Github repo name, you change
where it says file, we change that to URL, you can see and then you put in the path which
we were just experimenting with, right. So you pop that here and then you say launch
and what that does is it then gives you a URL. So then this URL you can pass on to people
and this is actually your interactive running application, so Binder is free and so this
is, you know, anybody can now use this to take their Voila app and make it a publicly
available web application. So try it, as it mentions here the first time you do this Binder
takes about five minutes to build your site because it actually uses something called
Docker to deploy the whole FastAI framework and Python and blah, blah, blah, but once
you've done that, that virtual machine will keep running for, you know, as long as people
are using it. It'll keep running for a while, that virtual machine will keep running for
a while as long as people are using it and you know it's it's reasonably fast. So a few
things to note here, being a free service you won't be surprised to hear this is not
using a GPU, its using a CPU, and so that might be surprising but we're deploying to
something which runs on a CPU. When you think about it though, this makes much more sense
to deploy to a CPU than a GPU the, just a moment, the thing that's happening here is
that I am passing along, let's go back to my app; in my app I'm passing along a single
image at a time, so when I pass along that single image I don't have a huge amount of
parallel work for a GPU to do. This is actually something that a CPU is going to be doing
more efficiently so we found that for folks coming through this course, the vast majority
of the time they wanted to deploy inference on a CPU not a GPU because they're normally
this doing one item at a time. It's way cheaper and easier to deploy to a CPU and the reason
for that is that you can just use any hosting service you like because just remember this
is just a, this is just a program at this point, right, and you can use all the usual
horizontal scaling, vertical scaling you know, you can use Heroku, you can use AWS, you can
use inexpensive instances super cheap and super easy. Having said that there are times
you might need to deploy to a GPU for example maybe you're processing videos and so like
a single video on on a CPU to process it might take all day or you might be so successful
that you have a thousand requests per second, in which case you could like take 128 at a
time, batch them together and put the whole batch on the GPU and get the results back
and pass them back around. You gotta be careful of that right because if your requests aren't coming fast
enough, your user has to wait for a whole batch of people to be ready to be processed.
But you know conceptually, as long as your site is popular enough that could work. The
other thing to talk about is, you might want to deploy to a mobile phone and the point
in to a mobile phone our recommendation is wherever possible do that by actually deploying
to a server and then have a mobile phone talk to the server over a network. Because if you
do that, again you can just use a normal Pytorch program on a normal server and normal network
calls, it makes life super easy. When you try to run a Pytorch app on a phone, you are
suddenly now not in an environment where Pytorch will run natively and so you'll have to like
convert your program into some other form. And there are other forms and the the main
form that you convert it to is something called ONNX which is specifically designed for kind
of super high speed the high performance you know approach that can run on both servers
or on mobile phones and it does not require the whole Python and Pytorch kind of runtime
in place but it's much more complex than not using it. It's harder to debug, it's harder
to set it up, it's harder to maintain it. So if possible keep things simple, and if
you're lucky enough that you're so successful that you need to scale it up to GPUs or and
stuff like that then great, you know, hopefully you've got the the finances at that point
to justify, you know, spending money on an ONNX expert, or serving expert or whatever.
And there are various systems you can use, ONNX runtime, and AWS Sagemaker where you
can kind of say, here's my ONNX bundle and it’ll serve it for you or whatever. Pytorch
also has a mobile framework, same idea. So, all right, so you've got, I mean it's kind
of funny we're talking about two different kinds of deployment here, one is deploying
like a hobby application you know that you're prototyping, showing off to your friends,
explaining to your colleagues how something might work, you know, a little interactive
analysis, that's one thing. But maybe you're actually prototyping something that you want
to turn into a real product, or an actual real part of your company's operations. When
you're deploying, you know, something in real life, there's all kinds of things you got
to be careful of. One example of something to be careful of is, let's say you did exactly
what we just did. Which actually, this is your homework, is to create your own application
and I want you to create your own image search application you can use my exact set of widgets
and whatever if you want to, but better still go to the ipywidgets website, and see what
other widgets they have and try and come up with something cool try and come and you know
try and show off as best as you can and show us on the forum. Now let's say you decided
that you want to create an app that would help the users of your app decide if they
have healthy skin or unhealthy skin. So if you did the exact thing we just did rather
than searching for grizzly bear and teddy bear and so forth on Bing, you would search
for healthy skin and unhealthy skin. And so here's what happens right, if I, and remember
in our version we never actually looked at being we just used the Bing API the Image
Search API but behind the scenes it's just using the website and so if I click healthy
if I type healthy skin and say search, I actually discover that the definition of healthy skin
is young white women touching their face lovingly. So that's what your your healthy skin classifier
would learn to detect, right, and so this is so this is a great example from Deb Raji
and you should check out her paper, “Actionable Auditing,” for lots of cool insights about
model bias. But I mean here's here's like a fascinating example of how if you weren't
looking at your data carefully you you end up with something that doesn't at all actually
solve the problem you want to solve. This is tricky. Right? Because the data that
you train your algorithm on, if you're building like a new product that didn't exist before,
by definition you don't have examples of the kind of data that's going to be used in real
life. Right? So you kind of try to find some, from somewhere, and if there and if you do
that through like a Google search pretty likely you're not going to end up with a set of data
that actually reflects the kind of mix you would see in real life. So you know the main
thing here is to say be careful. Right? And, and in particular for your test set, you know,
that final set that you check on, really try hard to gather data that reflects the real
world. So that goes, you know, for example for the healthy skin example, you might go
and actually talk to a dermatologist and try and find like ten examples of healthy and
unhealthy skin or something. And that would be your kind of gold standard test. Um. There's
all kinds of issues you have to think about in deployment. I can't cover all of them,
I can tell you that this O'Reilly book called ‘Building Machine Learning Powered Applications’
is, is a great resource, and this is one of the reasons we don't go into detail about
AP [corrects], A/B testing and when should we refresh our data and how we monitor things
and so forth, is because that book has already been written, so we don't want to rewrite
it. I do want to mention a particular area that I care a lot about though, which is,
let's take this example, let's say you're rolling out this bear detection system, and
it's going to be attached to video cameras around a campsite. It's going to warn campers
of incoming bears. So if we used the model that was trained with that data that we just
looked at, you know, those are all very nicely taken pictures of pretty perfect bears. Right?
There's really no relationship to the kinds of pictures you're actually going to have
to be dealing with in your, in your campsite bear detector, which has, it's going to have
video and not images, it's going to be nighttime, there's going to be probably low resolution
security cameras, you need to make sure that the performance of the system is fast enough
to tell you about it before the bear kills you. You know, there will be bears that are
partially obscured by bushes or in lots of shadow or whatever. None of which are the
kinds of things you would see normally in like internet pictures. So what we call this,
we call this ‘out of domain data’. ‘Out of domain data’ refers to a situation where
the data that you are trying to do inference on, is in some way different to the kind of
data that you trained with. This is actually... There's no perfect way to answer this question,
and when we look at ethics, we’ll talk about some really helpful ways to, to minimize how
much this happens. For example, it turns out that having a diverse team is a great way
to kind of avoid being surprised by the kinds of data that people end up coming up with,
but really is just something you've got to be super thoughtful about. Very similar to
that is something called the ‘main shift’ and the ‘main shift’ is where maybe you
start out with all of your data is ‘in domain data0’, but over time the kinds of data
that you're seeing changes and so over time maybe raccoons start invading your campsite,
and you weren't training on racoons before, it was just a bear detector, and so that's
called ‘domain shift’ and that's another thing that you have to be very careful of.
Rachel, is there a question? No, I was just gonna add to that in saying that, all data
is biased, so there's not kind of a, you know, a form of de bias data, perfectly representative
in all cases data, and that a lot of the proposals around addressing this have kind of been converging
to this idea, and that you see in papers like Timnit Gebru’s ‘Datasheets for Datasets’
of just writing down a lot of the details about your data set, and how it was gathered,
and in which situations it's appropriate to use, and how it was maintained, and so there,
that's not that, you've totally eliminated bias but that you're just very aware of the
attributes of your data set so that you won't be blindsided by them later. And there have
been, kind of, several proposals in that school of thought, which I, which I really like,
around this idea of just kind of understanding how your data was gathered and what its limitations
are. Thanks Rachel. So a key problem here is that you can't know
the entire behavior of your neural network. With normal programming you typed in the if
statements and the loops and whatever, so in theory you know what the hell it does.
Although, it’s still sometimes surprising. In this case you, you didn't tell it anything,
you just gave it examples ‘alone from’, and hoped that it learned something useful.
There are hundreds of millions of parameters in all of these neural networks, and so there's
no way you can understand how they all combine with each other to create complex behavior.
So really like, there's a natural compromise here is that we're trying to get sophisticated
behavior, so like, like recognizing pictures. S-+ophisticated enough behavior we can describe
it and so the natural downside is you can't expect the process that the thing is using
to do that to be describable. You, for you to be able to understand it. So our recommendation
for kind of dealing with these issues is a very careful deployment strategy, which I've
summarized in this little graph, this little chart here. The idea would be, first of all
whatever it is that you're going to use the model for, start out by doing it manually.
So have a park ranger watching for bears. Have the model running next to them and each
time the park ranger sees a bear they can check the model and see like, did it seem
to have picked it up. So the model is not doing anything. There's just a person who's
like, running it and seeing would it have made sensible choices, and once you're confident
that it makes sense, that what it's doing seems reasonable, you know, in those as close
to the real-life situation as possible, then deploy it in a time and geography limited
way. So pick like one campsite, not the entirety of California, and do it for, you know, one
day and have somebody watching it super carefully. Right? So now the basic bear detection is
being done by the bear detector but there's still somebody watching it pretty closely,
and it's only happening in one campsite, for one day, and so then as you say like: ‘Okay
we haven't destroyed our company yet. Let’s do two campsites for a week, and then let's
do, you know, the entirety of Marin for a month, and so forth.’ So this is actually
what we did when I used to be at this company called ‘Optimal Decisions’. ‘Optimal
Decisions’ was a company that I founded to do insurance pricing, and if you, if you
change insurance prices by, you know, a percent or two in the wrong direction, in the wrong
way, you can basically destroy the whole company. This has happened many times, you know. Insurers
are companies that set prices. That's basically the product that they provide. So when we
deployed new prices for ‘Optimal Decisions’ we always did it by like saying like: ‘Okay
we're going to do it for like five minutes or everybody whose name ends with a D.’
You know? So we kind of try to find some group, which hopefully would be fairly, you know,
it would be different, but not too many of them, and we would gradually scale it up,
and you've got to make sure that when you're doing this that you have a lot of really good
reporting systems in place that you can recognize… Are your customers yelling at you, are your
computers burning up, you know, are your, are your computers burning up, are your costs
spiraling out of control, and so forth. So it really requires great reporting systems.
Does fast AI have methods built-in that provide for incremental learning, i.e., improving
the model slowly over time with a single data point each time? Yeah, that's a great question.
So this is a little bit different, which is this is really about dealing with ‘domain
shift’ and similar issues by continuing to train your model as you do inference, and
so the good news is, you don't need anything special for that. It's basically just a transfer
learning problem. So you can do this in many different ways. Probably the easiest is just
to say, like: ‘Okay, each night...’ Probably the easiest is just to say: ‘Okay, each
night, you know, at midnight we're going to set off a task, which grabs all of the previous
day's transactions, as mini-batches and trains another epoch.’ And so yeah, that that actually
works fine. You can basically think of this as a fine tuning approach, where your pre-trained
model is yesterday's model, and your fine-tuning data is today's data. So as you roll out your model, one thing to
be thinking about super carefully is that it might change the behavior of the system
that it's a part of. And this can create something called a ‘feedback loop’ and ‘feedback
loops’ are one of the most challenging things for, for real world model deployment, particularly
of machine learning models, because they can take a very minor issue and explode it into
a really big issue. So, for example, think about a predictive policing algorithm. It's
an algorithm that was trained to recognize, you know, basically trained on data that says
whereabouts or arrests being made, and then as you train that algorithm based on where
arrests are being made, then you put in place a system that sends police officers to places
that the model says are likely to have crime, which in this case where were, were there,
where were arrests. Well, then more police go to that place, find more crime, because
the more police that are there the more they'll see. They arrest more people, causing, you
know, and then if you do this incremental learning, like we're just talking about, then
it's going to say: ‘Oh there's actually even more crime here.’ And so tomorrow it
sends even more police. And so in that situation you end up like, the predictive policing algorithm
ends up kind of sending all of your police on one street block, because at that point
all of the arrests are happening there, because that's the only place you have policemen.
Right? And I should say police officers. So there's actually a paper about this issue
called, ‘To predict and serve?’. And in ‘To predict and serve?’ the author's write
this really nice phrase: ‘Predictive policing is aptly named, it is predicting policing,
not predicting crime.’ So if the initial model was perfect, whatever the hell that
even means, but like it's somehow sent police to exactly the best places to find crime,
based on the probability of crimes actually being in place, I guess there's no problem.
Right? But as soon as there's any amount of bias. Right? So for example in the US, there's
a lot more arrests of black people than of white people, even for crimes where black
people and white people are known to do them in the same amount. So in the presence of
this bias, or any kind of bias, you're kind of like setting off this domino chain of ‘feedback
loops’, where that bias will be exploded over time. So, you know, one thing I like
to think about is to think like well: ‘What would happen if this, if this model was just
really really really good?’. Like: ‘Who would be impacted?’ You know: ‘What would
this extreme result look like? How would you know what was really happening?’ This incredibly
predictive algorithm that was like changing the behavior of yours, of your police officers
or whatever, you know. ‘What would that look like? What would actually happen?’
And then like, think about like: ‘Okay, what could go wrong?’ And then: ‘What
kind of rollout plan? What kind of monitoring systems? What kind of oversight could provide
the circuit breaker?’ Because that's what we really need here. Right? Is, we need like,
nothing's going to be perfect, you can't be sure that there's no ‘feedback loops’,
but what you can do is try to be sure that you see when the behavior of your system is
behaving in a way that's not what you want. Did you have anything to add to that Rachel?
I would add to that is that you're at risk of potentially having a ‘feedback loop’
anytime that your model is kind of controlling what your next round of data looks like. And
I think that's true for pretty much all products, and that can be a hard jump from people, people
coming from kind of a science background, where you may be thinking of data as: ‘I
have just observed some sort of experiment.’ Where is kind of, whenever you're, you know,
building something that interacts with the real world you are now also controlling what
your future data looks like based on, kind of, behavior of your algorithm for the current,
current round of data. Right? So… So given that you probably can't avoid ‘feedback
loops’ the, you know, the, the thing you need to then really invest in is the human
in the loop. And so a lot of people like to focus on automating things which I find weird,
you know, if you can decrease the amount of human involvement by like 90 percent you've
got almost all of the economic upside of automating it completely but you still have the room
to put human circuit breakers in place. You need these appeals processes, you need the
monitoring, you need, you know, humans involved to kind of go: ‘Hey that's, that's weird.
I don't think that's what we want.’ Okay, yes Rachel. And just one more note about
that. Those humans though do need to be integrated well with kind of product and engineering,
and so one issue that comes up is that in many companies I think that ends up kind of
being underneath trust and safety handles a lot of sort of issues with, how things can
go wrong, or how your platform can be abused, and often trust and safety is pretty siloed
away from product and eng, which actually kind of has the, the control over, you know,
these decisions that really end up influencing them. And so having... That. They. The engineers
probably consider them to be pretty, pretty annoying a lot of the time, how they get in
the way, and get in the way of them getting software out the door. Yeah, but like the
kind of, the more integration you can have between those I think it's helpful for the
kind of the people building the product to see what is going wrong, and what can go wrong.
Right. If the engineers are actually on top of that, they're actually seeing these, these
things happening, that it's not some kind of abstract problem anymore. So, you know,
at this point now that we've got to the end of chapter 2, you actually know a lot more
than most people about, about, deep learning, and actually about some pretty important foundations
of machine learning, more generally, and of data products more generally. So now’s a
great time to think about writing. So, sometimes we have formatted text that doesn't quite
format correctly. In Jupyter notebook by the way it only formats correctly in, in the book
book. So, that's what it means when you see this kind of pre-formatted text. So... The...
The idea here is to think about starting writing, at this point, before you go too much further.
Rachel. There's a question. Oh, okay let's hear the question. Question is: ‘I am, I
assume there are fast AI type ways of keeping a nightly updated transfer learning setup.
Well could there be one of the fast AI version 4 notebooks, have an example of the nightly
transfer learning training, like the previous person asked? I would be interested in knowing
how to do that most effectively with fast AI.’ Sure. So I guess my view is there's
nothing fast AI specific about that at all. So I actually suggest you read Emmanuel’s
book. That book I showed you to understand the kind of the ideas, and if people are interested
in this I can also point you with some academic research about this as well, and there's not
as much as that there should be, but there is some, there is some good work in this area.
Okay. So, the reason we mention writing at this point in our journey is because, you
know, things are going to start to get more and more heavy, more and more complicated,
and a really good way to make sure that you're on top of it is to try to write down what
you've learned. So sorry, I wasn’t sharing the right part of the screen before, but this
is what I was describing in terms of the pre-formatted text, which doesn't look correct. So... When...
So, Rachel actually has this great article that you should check out which is ‘Why
you should blog’, and I will say it's sort of her saying cuz I have it in front of me
and she doesn't. Weird as it is. So Rachel says that: ‘The top advice she would give
her younger self is to start blogging sooner.’ So Rachel has a math PhD, and this kind of
idea of, like, blogging was not exactly something, I think, they had a lot of in the PhD program,
but actually it's like, it's a really great way of finding jobs. In fact, most of my students
who have got the best jobs are students that have good blog posts. The thing I really love
is that it helps you learn by, by writing down, it kind of synthesizes your ideas, and
yeah, you know, there's lots of reasons to blog. So there's actually something really
cool I want to show you. Yeah. I was also just gonna note I have a second post called
‘Advice for Better Blog Posts’, that's a little bit more advanced, which I'll post
a link to as well, and that, talks about some common pitfalls that I've seen in many, in
many blog posts, and kind of the importance of putting, putting the time in to do it well,
and and some things to think about. So I'll share that post as well. Thanks Rachel. Um, so one reason that sometimes people don't
blog is because it's kind of annoying to figure out how to. Particularly, because I think
the thing that a lot of you will want to blog about is cool stuff that you're building in
Jupyter notebooks. So, we've actually teamed up with a guy called Hamel Husain, and, and
with GitHub to create this free product. As usual with fast AI, no ads, no anything, called
‘fastpages’, where you can actually blog with Jupyter notebooks. And so you can go
to ‘fastpages’ and see for yourself how to do it, but the basic idea is that, like,
you literally click one button, it sets up a plug for you, and then you dump your notebooks
into a folder called underscore notebooks, and they get turned into blog posts. It's...
It's basically like magic, and Hamel's done this amazing job of this, and so... This means
that you can create blog posts where you've got charts, and tables, and images, you know,
where they're all actually the output of, of Jupyter notebook, along with all the, the
markdown formatted text, headings, and so forth, and hyperlinks, and the whole thing.
So this is a great way to start writing about what you're learning about here. So something
that Rachel and I both feel strongly about when it comes to blogging is this, which is,
don't try to think about the absolute most advanced thing you know and try to write a
blog post that would impress Geoff Hinton. Right? Because most people are not Geoff Hinton.
So like, (a) you probably won't do a good job, because you're trying to, like, blog
for somebody who's more, got more expertise than you, and (b) you've got a small audience
now. Right? Actually there's far more people that are not very familiar with deep learning,
than people who are. So try to think, you know, and, and you really understand what
it's like, what it was like six months ago to be you, because you were there six months
ago. So try and write something, which the six months ago version of you would have been,
like, super interesting, full of little tidbits you would have loved, you know, that you would,
that would have delighted you, that six months ago version of you. Okay. So once again, don't
move on until you've had a go at the questionnaire, to make sure that you, you know, understand
the key things we think that you need to understand, and, yeah, have a think about these further
research questions as well, because they might help you to engage more closely with material.
So let's have a break, and we'll come back in five minutes time. So welcome back everybody.
This is an interesting moment in the course, because we're kind of jumping from a part
of the course, which is, you know, very heavily around kind of the, kind of the, the structure
of like what are we trying to do with machine learning, and what are the kind of the pieces,
and what do we need to know to make everything kind of work together. There was a bit of
code, but not masses. There was basically no math, and we kind of want to put that at
the start for everybody who's not, you know, who's kind of wanting to, an understanding
of, of these issues, without necessarily wanting to, kind of, dive deep into the code, in the
math themselves. And now we're getting into the diving deeper part. If, if you're not
interested in that diving deep yourself, you might want to skip to the next lesson about
ethics, where we, you know, is kind of, that rounds out the kind of, you know, slightly
less technical material. So what we're going to look at here is, we're going to look at
what we think of as kind of a toy problem, but just a few years ago is considered a pretty
challenging problem. The problem is recognizing handwritten digits, and we're going to try
and do it from scratch. Right? And we're gonna try and look at a number of different ways
to do it. So, we're going to have a look at a dataset called MNIST, and so, if you've
done any machine learning before you may well have come across MNIST. It contains handwritten
digits and it was collided into a machine learning data set by a guy called Yann LeCun
and some colleagues, and they used that to demonstrate one of the, you know, probably
the first computer system to provide really practically useful scalable recognition of
handwritten digits. LeNet-5 was the system, was actually used to automatically process
like 10% of the checks in the, in the US. So, one of the things that really helps, I
think, when building a new model is to, kind of, start with something simple, and gradually
scale it up. So, we've created an even simpler version of MNIST, which we call MNIST_SAMPLE,
which only has threes and sevens. Okay, so this is a good starting point to make sure
that we can, kind of, do something easy. I picked threes and sevens for MNIST_SAMPLE,
because they're very different. So I feel like, if we can't do this, we're going to
have trouble recognizing every digit. [coughs] So step one is to call untar_data, untar_data
is the fast AI function which takes a URL, checks whether you've already downloaded it,
if you haven't it downloads it, checks whether you've already uncompressed it, if you haven't,
it uncompress is it, and then it finally returns the path of where that ended up. So you can
see here URLs.MNIST_SAMPLE. So you could just hit tab to get autocomplete. Is just some,
some location. Right? Doesn't really matter where it is, and so then when we... All that,
I've already downloaded it, and already uncompressed it, because I've already run this once before,
so it happens straight away, and so path shows me where it is. Now in this case path is dot,
and the reason path is dot is, because I've used this special base path attribute to path,
to tell it kind of like where's my, where's my starting point, you know, and, and that's
used to print so when I go here ls, which prints a list of files, these are all relative
to where I actually untarred this to. So it just makes it a lot easier not to have to
see the whole set of parent path folders. Um ls is actually... So, so path is a... Let's
see what kind of type it is. So, it's a pathlib path object. Um, pathlib is part of the Python
standard library. It's a really very, very, very nice library, but it doesn't actually
have ls. Where there are libraries that we find super helpful, but they don't have exactly
the things we want, we liberally add the things we want to them. So we add ls. Right? So if
you want to find out what ls is, you know, there's, as we've mentioned it's a few ways
you can do it you can pop a question mark there, and that will show you where it comes
from. So there's actually a library called fastcore, which is a lot of the foundational
stuff in fast AI that is not dependent on PyTorch, or pandas, or any of these big heavy
libraries. So, this is part of fastcore and if you want to see exactly what it does, you,
of course remember, you can put in a second question mark, to get the source code, and
as you can see there's not much source code to it. And, you know, maybe most importantly,
please, don't forget about doc, because really importantly that gives you this ‘Show in
docs’ link, which you can click on to get to the documentation to see examples, pictures,
if relevant, tutorials, tests ,and so forth. So what's, so when you're looking at a new
data set, you kind of just used, I always start with just ls, see what's in it, and
I can see here there's a train folder, and there's a valid folder, that's pretty normal.
So let's look at ls on the train folder, and it's got a folder called 7 and a folder called
3, and so this is looking quite a lot like our bear classifier dataset. We downloaded
each set of images into a folder based on what its label was. This is doing it at another
level though. The first level of the folder hierarchy is, is it training or valid, and
the second level is, what's the label. And this is the most common way for image datasets
to be distributed. So let's have a look. Let's just create something called 3s, that contains
all of the contents of the three directory. Training. And let's just sort them, so that
this is consistent. Do the same for sevens, and let's look at the 3s and you can see there's
just, they’re just numbered. All right. So let's grab one of those, open it, and take
a look. Okay. So, there's the picture of a 3. And so what is that really? But not 3, im3. So PIL is the Python Imaging
Library. It's the most popular library by far for working with images on Python and
it's a PNG, not surprisingly. So Jupyter notebook knows how to display many different types
and you can actually tell if you create a new type you can tell it how to display your
type. And so PIL comes with something that will automatically display the image, like
so. What I want to do here though is to look at like how we're going to treat this as numbers,
right. And so one easy way to treat things as numbers is to turn it into an array. The
array is part of numpy, which is the most popular array programming library for Python.
And so if we pass our PIL image object to array, it just converts the image into a bunch
of numbers. And the truth is, it was a bunch of numbers the whole time. It was actually
stored as a bunch of numbers on disk. It's just that there's this magic thing in Jupyter
that knows how to display those numbers on the screen. Now let me say, array(), turning
it back into a numpy array. We're kind of removing this ability for Jupyter notebook
to know how to display it like a picture. So once I do this, we can then index into
that array and (create everything from the) grab everything, all the rows from 4 up to
but not including 10, and all the columns from 4 up to and not including 10. And here
are some numbers and they are 8-bit unsigned integers, so they are between 0 and 255. So
an image, just like everything on a computer, is just a bunch of numbers. And therefore,
we can compute with it. We could do the same thing, but instead of saying array(), we could
say tensor(). Now our tensor is basically the PyTorch version of a numpy array. And
so you can see it looks, it's exactly the same code as above, but I've just replaced
array() with tensor(). And the output looks almost exactly the same, except it replaces
array with tensor and so you'll see this - that basically a PyTorch tensor and an numpy array
behave nearly identically, much if not most of the time. But the key thing is that a PyTorch
tensor can also be computed on a GPU, not just a CPU. So in in our work, and in the
book, and in the notebooks, in our code, we tend to use tensors, PyTorch tensors, much
more often than numpy arrays because they kind of have nearly all the benefits of numpy
arrays, plus all the benefits of GPU computation. And they've got a whole lot of extra functionality
as well. A lot of people who have used Python for a long time, always jump into numpy because
that's what they used to. If that's you, you might want to start considering jumping into
tensor. Like wherever you used to write array, just start writing tensor and just see what
happens. Because you might be surprised at how many things you can speed up or do it
more easily. So let's grab that that 3 image, turn it into a tensor and so that's going
to be a 3 image tensor - that's why I've got a im3_t here. And let's grab a bit of it,
okay, and turn it into a panda's data frame. And the only reason I'm turning it into a
panda's data frame is that pandas has a very convenient thing called background_gradient()
that turns a background into a gradient, as you can see. So here is the top bit of the
3. You can see that the 0s are the whites and the numbers near 255 are the blacks. Okay,
and there’s some whatsit bits in the middle which, which are grey. So here we have, we
can see what's going on when our images, which are numbers, actually get displayed on the
screen. It's just it's just doing this, okay, and so I'm just showing a subset here the
actual phone number and MNIST is a 28 by 28 pixels square. So that's 768 pixels. So that's
super tiny, right. Well my mobile phone, I don't know how many megapixels it is, but
it's millions of pixels. So it's nice to start with something simple and small, okay. So,
here's our goal - create a model, but by model, I just mean some kind of computer program
learnt from data that can recognize 3s versus 7s. You can think of it as a 3 detector. Is
it a 3, because if it's not a 3, it's a 7. So have a stop here, pause the video and have
a think. How would you do it? How would you, like you don't need to know anything about
neural networks, or anything else. How might you, just with common sense, build a 3 detector,
okay? So I hope you grabbed a piece of paper, a
pen, jotted it some notes down. I’ll tell you the first idea that came into my head
was … what if we grab every single 3 in the data set and take the average of the pixels?
So what's the average of this pixel, the average of this pixel, the average of this pixel,
the average of this pixel, right. And so there'll be a 28 by 28 picture which is the average
of all of the 3s, and that would be like the ideal 3. And then we'll do the same for 7s.
And then so when we then grab something from the validation set to classify, we’ll say,
“Like, oh, is this image closer to the ideal 3s, the ideal 3, the mean of the 3s, or the
ideal 7? This is my idea and so I'm going to call this the pixel similarity approach.
I'm describing this as a baseline. A baseline is like a super simple model that should be
pretty easy to program from scratch with very little magic. You know, maybe it's just a
bunch of kind of simple averages, simple arithmetic, which you're super confident is going to be
better than, better than a random model, right. And one of the biggest mistakes I see, in
even experienced practitioners, is that they fail to create a baseline. And so then they
build some fancy Bayesian model or, or some fancy, fancy Bayesian model or some fancy
neural network and they go, “Wow, Jeremy look at my amazingly great model!” And I'll
say like, “How do you know it's amazingly great?” and they’ll say, “oh, look,
the accuracy is 80%.” And then I'll say, “Okay, let's see what happens if we create
a model where we always predict the mean. Oh look, that's 85%.” And people get pretty
disheartened when they discover this, right. And so make sure you start with a reasonable
baseline and then gradually build on top of it. So we need to get the average of the pixels,
so we're going to learn some nice Python programming tricks to do this. So the first thing we need
to do is we need a list of all of the 7s. So remember we've got the 7s - maybe it is
just a list of file names, right. And so for each of those file names in the 7s, lets Image.open()
that file just like we did before to get a PIL object, and let's convert that into a
tensor. So this thing here is called a list comprehension. So if you haven't seen this
before, this is one of the most powerful and useful tools in Python. If you've done something
with C#, it's a little bit like link - it's not as powerful as link, but it's a similar
idea. If you've done some functional programming in in JavaScript, it's a bit like some of
the things you can do with that, too. But basically, we're just going to go through
this collection, each item will become called “o”, and then it will be passed to this
function, which opens it up and turns it into a tensor. And then it will be collated all
back into a list. And so this will be all of the 7s as tensors. So Silva and I use lists
and dictionary comprehensions every day. And so you should definitely spend some time checking
it out, if you haven't already. So now that we've got a list of all of the 3s as tensors,
let's just grab one of them and display it. So remember, this is a tensor, not a PIL image
object. Ao Jupyter doesn't know how to display it. So we have to use something a command
to display it - and show_image() is a fast.ai command that displays a tensor. And so here
is 3. So we need to get the average of all of those 3s. So to get the average, the first
thing we need to do is to (turn) change this so it's not a list, but it's a tensor itself.
Currently three_tensors[1] has a shape which is 28 by 28. Oh this is this is the rows by
columns, the size of this thing, right. But three_tensors itself, it's just a list. But
I can't really easily do mathematical computations on that. So what we could do is we could stack
all of these 28 by 28 images on top of each other to create a, like a 3d cube of images.
And that's still quite a tensor. So a tensor can have as many of these axes or dimensions
as you like. And to stack them up you use, funnily enough, stack(). And so this is going
to turn the list into a tensor. And as you can see the shape of it is now 6131 by 28
by 28. So it's kind of like a cube of height 6131 by 28 by 28. The other thing we want to do is, if we're
going to take the mean we want to turn them into floating-point values, because we don't
want to kind of have integers rounding off. The other thing to know is that it's just
kind of a standard in computer vision that when you are working with floats, that you
expect them to be between 0 and 1. So we just divide by 255, because they were between 0
and 255 before. So this is a pretty standard way to kind of represent a bunch of images
in PyTorch. So these three things here are called the axes -- first axis, second axis,
third axis, and overall we would say that this is a rank 3 tensor, as it has three axes.
So this one here was a rank two tensor -- just has two axes. So you can get the rank from
a tensor by just taking the length of its shape: one, two, three. You can also get that
from, so the word -- I've been using the word axis -- you can also use the word dimension.
I think numpy tends to call it axis; pytorch tends to call it dimension. So the rank is
also the number of dimensions: ndim. So you need to make sure that you remember this word.
Rank is the number of axes or dimensions in a tensor, and the shape is a list containing
the size of each axis in a tensor. So we can now say stacked_threes.mean(). Now,
if we just say stacked_threes.mean(), that returns a single number -- that's the average
pixel across that whole cube, that whole rank three tensor. But if we say mean(0), that
is: take the mean over this axis, so that's the mean across the images, right? And so
that's now 28 by 28 again, because we kind of like reduced over this 6131 axis. We took
the mean across that axis and so we can show that image, and here is our ideal three. So
here's the ideal seven using the same approach. All right, so now let's just grab a three
-- it's just any old three -- there it is. And what I'm going to do is I'm going to say,
“Well, is this three more similar to the perfect three, or is it more similar to the
perfect seven?” And whichever one it's more similar to, I'm going to assume that that's
the answer. So we can't just say look at each pixel and say what's the difference between
this pixel you know zero zero here, and zero zero here, and then 0 1 here, and then 0 1
here, and take the average. And the reason we can't just take the average is that there's
positives and negatives, and they're going to average out to nothing, right, so I actually
need them all to be positive numbers. So there's two ways to make them all positive
numbers. I could take the absolute value, which simply means remove the minus signs,
okay? And then I could take the average of those; that's called the mean absolute difference
or L1 norm. Or I could take the square of each difference and then take the mean of
that, and then at the end I could take the square root, kind of undoes the squaring,
and that's called the root mean squared error, or L2. So let's have a look. Let's take a
three and subtract from it the mean of the threes, and take the absolute value, and take
the mean and call that the distance using absolute value of the three to a_3. And there
is the number, .1. And so this is the mean absolute difference, or L1 norm. So when you
see a word like L1 norm, if you haven't seen it before it may sound pretty fancy, but all
these math terms that we see, you know you can turn them into a tiny bit of code, right?
It's, you know, don't let the mathy bits fool you. They're often -- like in code it's just
very obvious what they mean, whereas with math you just, you just have to learn it,
or learn how to google it. So here’s the same version for squaring:
take the difference, square it, take the mean, and then take the square root. So now we'll
do the same thing for our three; this time we'll compare it to the mean of the sevens.
All right, so the distance from a_3 to the mean of the threes in terms of absolute was
.1, and the distance from a_3 to the mean of the sevens was 0.15. So it's closer to
the mean of the threes than it is to the mean of the sevens, so we guess therefore that
this is a three, based on the mean absolute difference. Same thing with RMSE (root mean
squared error) would be to compare this value with this value, and again root mean squared
error is closer to the mean3 than to the mean7. So this is like a machine learning model (kind
of); it’s a data-driven model which attempts to recognize threes versus sevens, and so
this is a good baseline. I mean, it's a reasonable baseline, it's going to be better than random.
We don't actually have to write out “- abs mean” -- we can just actually use L1 loss.
Now, L1 loss does exactly that; we don't have to write “- squared” -- we can just write
mse_loss, and that doesn't do the square root by default so we have to pop that in. Okay?
And as you can see, they're exactly the same numbers.
It's very important before we kind of go too much further, to make sure we're very comfortable
working with arrays and tensors. And you know, they're so similar. So we could start with
a list of lists, right, which is kind of a matrix. We can convert it into an array, or
into a tensor. We can display it, and they look almost the same. You can index into a
single row, you can index into a single column, and so it's important to know -- this is very
important -- colon means every row, because I put it in the first spot. Right, so if it
were in the second spot it would mean every column and so therefore comma colon ( ,: ) is
exactly the same as removing it. So it just turns out you can always remove colons that
are at the end, because they're kind of, they're just implied, right? You never have to, and
I often kind of put them in anyway, because just kind of makes it a bit more obvious how
these things kind of match up, or how they differ. You can combine them together so give
me the first row and everything from the first up to but not including the third column -- back
to this at 5, 6. You can add stuff to them; you can check their type. Notice that this
is different to the Python type, right, so type is a function; this tells you it's a
tensor. If you want to know what kind of tensor, you have to use type as a method. So it's
a long tensor. You can multiply them by a float, turns it into a float. You know so
have a fiddle around if you haven't done much stuff with numpy or PyTorch before, this is
a good opportunity to just go crazy -- try things out. Try things that you think might
not work and see if you actually get an error message, you know.
So we now want to find out how good is our model? Our model that involves just comparing
something to to the mean. So we should not compare… you should not check how good our
model is on the training set. As we've discussed, we should check it on a validation set, and
we already have a validation set: it's everything inside the valid directory. So let's go ahead
and like combine all those steps before. Let's go through everything in the validation set
3.ls(). Open them, turn them into a tensor, stack them all up, turn them into floats,
divide by 255. Okay, let's do the same for sevens. So we're
just putting all the steps we did before into a couple of lines. I always try to print out shapes, like all
the time, because if a shape is not what you expected then you can, you know, get weird
things going on. So the idea is we want some function is_three that will return true if
we think something is a three. So to do that we have to decide whether our digit that we're
testing on is closer to the ideal three or the ideal seven. So let's create a little
function that returns the difference between two things, takes the absolute value and then
takes the mean. So we're going to create this function mnist_distance that takes the difference
between two tensors, takes their absolute value, and then takes the mean. And it takes
the mean, and look at this, we got minus this time, it takes the mean over the last -- over
the second last and third last -- sorry the last and second last dimensions. So this is
going to take the mean across the kind of x and y axes. And so here you can see it's
returning a single number, which is the distance of a three from the mean3. So that's the same
as the value that we got earlier: .1114. So we need to do this for every image in the
validation set because we're trying to find the overall metric. Remember: the metric is
the thing we look at to say how good is our model. So here's something crazy: we can call
mnist_distance not just on a three, but on the entire validation set, against the mean
three. That's wild! Like, there's no normal programming that we would do where we could
somehow pass in either a matrix or a rank 3 tensor and somehow it works both times.
And what actually happened here is that instead of returning a single number it returned 1,010
numbers. And it did this because it used something called broadcasting. And broadcasting is like
the super special magic trick that lets you make Python into a very very high-performance
language, and in fact, if you do this broadcasting on GPU tensors and PyTorch, it actually does
this operation on the GPU even though you wrote it in Python. Here's what happens. Look
here this a - b. So we’re doing a-b on two things. We've got first of all valid_3_tens,
so valid three tensor is a thousand or so images, right, and remember that mean3 is
just our single ideal three. So what is something of this shape minus something of this shape?
Well, broadcasting means that if this shape doesn't match this shape, like if they did
match it would just subtract every corresponding item, but because they don't match, it actually
acts as if there's a thousand and ten versions of this. So it's actually going to subtract
this from every single one of these okay. So broadcasting -- let's look at some examples.
So broadcasting requires us to first of all to understand the idea of element-wise operations.
This is an element-wise operation. Here is a rank 1 tensor of size 3 and another rank
1 tensor of size 3, so we would say these sizes match (they're the same) and so when
I add 1, 2, 3, to 1, 1, 1 I get back 2 3 4. It just takes the corresponding items and
adds them together. That's called element-wise operations. So when I have different shapes,
as we described before, what it ends up doing is it basically copies this number a thousand
and ten times, and it acts as if we had said valid_3_tens minus 1,010 copies of mean3.
As it says here it doesn't actually copy mean3 1,010 times; it just pretends that it did,
right? It just acts as if it did, so basically kind of loops back around to the start again
and again and it does the whole thing in C or in CUDA on the GPU. So then we see absolute
value, right? So let's go back up here after we do the minus, we go absolute value so what
happens when we call absolute value on something of size 1010 by 28 by 28? It just calls absolute
value on each underlying thing right and then finally we call mean. -1 is the last element
always in Python, -2 is the second-last. So this is taking the mean over the last two
axes, and so then it's going to return just the first axis. So we're going to end up with
1,010 means -- 1,010 distances, which is exactly what we want: we want to know how far away
is our each of our validation items from the the ideal three. So then we can create our
is_3 function, which is, “Hey, is the distance between the number in question and the perfect
three less than the distance between the number in question and the perfect seven?” If it
is, it's a three, right? So our three, that was an actual three we had: is it a three?
Yes. Okay, and then we can turn that into a float, and “yes” becomes 1.0. Thanks
to broadcasting, we can do it for that entire set, right? So this is so cool! We basically
get rid of loops. In this kind of programming, you should have very few, very very few loops.
Loops make things much harder to read, and hundreds of thousands of times slower (on
the GPU potentially tens of millions of times slower). So we can just say is_3 on our whole
valid_3_tens and then turn that into float, and then take the mean; so that's going to
be the accuracy of the threes on average. And here's the accuracy of the sevens -- it's
just one minus that -- and so the accuracy across threes is about 91 and a bit percent.
The accuracy on sevens is about 98%, and the average of those two is about 95%. So here
we have a model that's 95 percent accurate at recognizing threes from sevens. It might
surprise you that we can do that using nothing but arithmetic, right, but so that's what
I mean by getting a good baseline. Now the thing is, it's not obvious how we
kind of improve this, right? I mean the thing is, it doesn't match Arthur Samuel’s description
of machine learning. This is not something where there's a function which has some parameters
which we're testing against some kind of measure of fitness, and then using that to like improve
the parameters iteratively. We kind of, we just did one step and that's that, okay?
So we will try and do it in this way where we arrange for some automatic means of testing
the effectiveness of -- he called it a weight assignment, we'd call it a parameter assignment
-- in terms of performance, and a mechanism for altering the weight assignment to maximize
the performance. But we won’t do it that way, right, because we know from Chapter 1,
from Lesson 1, that if we do it that way, we have this like magic box called machine
learning that can do -- you know, particularly combined with neural nets -- should be able
to solve any problem, in theory, if you can at least find the right set of weights. So
we need something that we can get better and better, to learn. So let's think about a function
which has parameters. So instead of finding an ideal image and seeing how far away something
is from the ideal image, so instead of like having something where we test how far away
we are from an ideal image, what we could instead do is come up with a set of weights
for each pixel. So we're trying to find out if something is the number three, and so we
know that like in the places that you would expect to find ‘3’ pixels, you could give
those like high weights. So you can say,”Hey, if there's a dot in those places, we give
it like a high score and if there's dots in other places we'll give it like a low score.
So we can actually come up with a function where the probability of something being,
well in this case let's say an eight, is equal to the pixels in the image multiplied by some
sort of weights, and then we sum them up, right, so then anywhere where our -- the image
we're looking at, you know, has pixels where there are high weights, it's going to end
up with a high probability. So here x is the image that we're interested in, and we're
just going to represent it as a vector, so let's just have all the rows stacked up, end
to end into a single long line. So we're going to use an approach where we're going to start
with a vector W. So a vector is a Rank 1 tensor, okay? We’re going to start with a vector
W that's going to contain random weights, random parameters, depending on whether you
use the Arthur Samuel version of the terminology or not. And so, we'll then predict whether a number
appears to be a three or a seven by using this tiny little function. And then we will
figure out how good the model is. Where we will calculate like, how accurate it is or
something like that. Yeah this is the loss, and then the key step is we're then going
to calculate the gradient. Now the gradient is something that measures for each weight
if I made it a little bit bigger will the loss get better or worse. If I made it a little
bit smaller will the loss get better or worse? And so if we do that for every weight we can
decide for every weight whether we should make that weight a bit bigger or a bit smaller.
That’s called the gradient. Right? So once we have the gradient we then step, is the
word we use is step. Change all the weights, up a little bit for the ones where the gradient
we should, said, we should make them a bit higher, and down a little bit for all the
ones where the gradient said they should be a bit lower. So now it should be a tiny bit
better and then we go back to step two and calculate a new set of predictions, using
this formula, calculate the gradient again, step the weights, keep doing that. So this
is basically the flow chart and then at some point when we're sick of waiting or when the
loss gets good enough we'll stop. So these seven steps 1, 2, 3, 4, 5, 6, 7… These seven
steps are the key to training all deep learning models. This technique is called stochastic
gradient descent. Well, it's called gradient descent, we’ll see the stochastic bit very
soon. And for each of these seven steps there's lots of choices around exactly how to do it.
Right? We've just kind of hand waved a lot, like what kind of random initialization, and
how do you calculate the gradient, and exactly what step do you take based on the gradient,
and how do you decide when to stop, blah blah blah. Right? So in this... In this course
we're going to be like learning about, you know, these steps, you know, that's kind of
part one, you know. I then the other big part is like, well what's the actual function,
neural network. So how do we train the thing and what is the thing that we train. So, we
initialize parameters with random values. We need some function that's going to be the
loss function that will return a number that's small if the performance of the model is good.
We need some way to figure out whether the weight should be increased a bit or decreased
a bit, and then we need to decide like when to stop, which will just say let's just do
a certain number of epochs. So, let's like, go even simpler. Right? We're not even going
to do MNIST. We're going to start with this function x squared, okay? And in fast AI we've
created a tiny little thing called plot function, that plots a function. All right, so there’s
our function f, and what we're going to do is we're going to try to find this is our
loss function. So we're going to try and find the bottom point. Right? So we're going to
try and figure out what is the x value, which is at the bottom. So our seven step procedure
requires us to start out by initializing, so we need to pick some value. Right? So the
value we pick, which is to say: ‘oh let's just randomly pick minus one and a half.’
Great! So now we need to know, if I increase x a bit, does my, but remember this is my
loss does my loss get a bit better, remember better is smaller, or a bit worse. So we can
do that easily enough. We can just try a slightly higher x and a slightly lower x and see what
happens. Right? And you can see it's just the slope. Right? The slope at this point
tells you that if I increase x by a bit then my loss will decrease, because that is the
slope at this point. So, if we change our, our weight, our parameter, just a little bit
in the direction of the slope. Right? So here is the direction of the slope and so here's
the new value at that point, and then do it again, and then do it again, eventually we'll
get to the bottom of this curve. Right? So this idea goes all the way back to Isaac
Newton, at the very least, and this basic idea is called Newton's method. So a key thing
we need to be able to do is to calculate this slope. And the bad news is to do that we need
calculus. At least that’s bad news for me because I've never been a fan of calculus.
We have to calculate the derivative. Here's the good news, though. Maybe you spent ages
in school learning how to calculate derivatives - you don't have to anymore, the computer
does it for you, and the computer does it fast. It uses all of those methods that you
learned at school and a whole lot more - like clever tricks for speeding them up, and it
just does it all automatically. So, for example, it knows (I don't know if you remember this
from high school) that the derivative of x squared is 2x. It’s just something it knows,
it's part of its kind of bag of tricks, right. So, so PyTorch knows that. PyTorch has an
engine built in that can take derivatives and find the gradient of functions. So to
do that we start with a tensor, let's say, and in this case we're going to modify this
tensor with this special method called requires_grad. And what this does is it tells PyTorch that
any time I do a calculation with this xt, it should remember what calculation it does
so that I can take the derivative later. You see the underscore at the end? An underscore
at the end of a method in PyTorch means that this is called an in-place operation it actually
modifies this. So, requires_grad_ modifies this tensor to tell PyTtorch that we want
to be calculating gradients on it. So that means it's just going to have to keep track
of all of the computations we do so that it can calculate the derivative later. Okay,
so we've got the number 3 and let's say we then call f on it (remember f is just squaring
it, so 3 squared is 9. But the value is not just 9, it's 9 accompanied with a grad function
which is that it knows that a power operation has been taken. So we can now call a special
method, backward(). And backward(), which refers to backpropagation, which we'll learn
about, which basically means take the derivative. And so once it does that we can now look inside
xt, which we said requires grad, and find out its gradient. And remember, the derivative
of x squared is 2x. In this case that was 3, 2 times 3 is 6. All right, so we didn't
have to figure out the derivative we just call backward(), and then get the grad attribute
to get the derivative. so that's how easy it is to do calculus in PyTorch. So what you
need to know about calculus is not how to take a derivative, but what it means. And
what it means is it's a slope at some point. Now here's something interesting - let's not
just take3, let's take a Rank 1 tensor also known as a vector [3., 4., 10.] and let's
add sum to our f function. So it's going to go x squared .sum. and now we can take f of
this vector, get back 125. And then we can say backward() and grad and look - 2x 2x 2x.
So we can calculate, this is, this is vector calculus, right. We're getting the gradient
for every element of a vector with the same two lines of code. So that's kind of all you
need to know about calculus, right. And if this is, if this idea that, that a derivative
or gradient is a slope is unfamiliar, check out Khan Academy. They had some great introductory
calculus. And don't forget you can skip all the bits here they teach you how to calculate
the gradients yourself. So now that we know how to calculate the gradient, that is the
slope of the function, that tells us if we change our input a little bit, how will our
output change correspondingly. That's what a slope is, right. And so that tells us that
for every one of our parameters, if we know their gradients, then we know if we change
that parameter up a bit or down a bit, how will it change our loss. So therefore, we
then know how to change our parameters. So what we do is, let's say all of our weights
are called “w”, we just subtract off them the gradients multiplied by some small number
and that small number is often a number between about 0.001 and 0.1 and it's called the learning
rate and this here is the essence of gradient descent So if you pick a learning rate that's very
small, then you take the slope and you take a really small step in that direction, and
another small step, another small step, another small step, and so on, it's going to take
forever to get to the end. If you pick a learning rate that's too big, you jump way too far
each time and again, it's going to take forever. And in fact in this case, sorry this case
we're assuming we're starting here and it's actually is so big it got worse and worse.
Or here's one where we start here and it's like it's not so big it gets worse and worse,
but it just takes a long time to bounce in and out right. So picking a good learning
rate is really important, both to making sure that it's even possible to solve the problem
and that it's possible to solve it in a reasonable amount of time. So we'll be learning about
picking, how to pick learning rates in this course.
So let's try this, let's try using gradient descent. I said SGD, that's not quite accurate,
it's just going to be gradient descent to solve an actual problem. So the problem we're
going to solve is, let's imagine you were watching a roller coaster go over the top
of a hump, right. So as it comes out of the previous hill it's going super fast and it's
going up the hill and it's going slower and slower and slower until it gets to the top
of the hump, and then it goes down the other side, it gets faster and faster and faster.
So if you like how to stopwatch or whatever or some kind of speedometer and you are measuring
it just by hand at kind of equal time points you might end up with something that looks
a bit like this, right. And so the way I did this was I just grabbed a range, just grabbed
the numbers from naught up to, but not including 20, right. These are the time periods at which
I'm taking my speed measurement. And then I've just got some quadratic function here
- I multiplied by 3 and then square it and then add 1, whatever, right. And then I also,
actually sorry. I take my time - 9.5 square it, times .75, and add 1. And then I add a
random number to that or add a random number to every observation. So I end up with a quadratic
function which is a bit bumpy. So this is kind of like what it might look like in real
life because my speedometer kind of testing is not perfect. All right, so we want to create
a function that estimates at any time what is the speed of the roller-coaster. So we
start by guessing what function it might be. So we guess that it's a function - a times
time squared, plus b times time, plus c - you might remember from school is called a quadratic.
So let's create a function, right. And so let's create it using kind of the Alpha Samuels
technique, the machine learning technique. This function is going to take two things
- it's going to take an input, which in this case is a time, and it's going to take some
parameters. And the parameters are a, b, and c. So in Python you can split out a list or
a collection into its components, like so. And then here's that function. So we’re
not just trying to find any function in the world, we're just trying to find some function
which is a quadratic by finding an a, and a b, and a c. So the Arthur Samuel technique
for doing this is to next up come up with a loss function; come up with a measurement
of how good we are. So if we've got some predictions that come out of our function and the targets
which are these, you know, actual values, then we could just do the mean squared error.
Okay, so here's that means squared error we saw before - the difference squared, then
take the mean. So now we need to go through our seven step process, we want to come up
with a set of three parameters a, b and c, which are as good as possible. So step one
is to initialize a, b, and c to random values. So this is how you get random values, three
of them in PyTorch. And remember we're going to be adjusting them, so we have to tell PyTorch
that we want the gradients. I'm just going to save those away so I can check them later.
And then I calculate the predictions using that function, f, which was this. And then
let's create a little function which just plots how good at this point are our predictions.
So here is a function that prints in red our predictions, and in blue our targets. So that
looks pretty terrible. So let’s calculate the loss, using the mse
function we wrote. Okay, so now we want to improve this. So calculate the gradients using
the two steps we saw, call backward and then get grad. And this says that each of our parameters
has a gradient that's negative. Let's pick a learning rate of ten to the minus five,
or we multiply that by ten to the minus five, and step the weights, And remember step the
weights means minus equals the learning rate times the gradient. There’s a wonderful
trick here, which I’ve called .data. The reason I've called .data is that .data is
a special attribute in PyTorch, which if you use it, then the gradient is not calculated.
And we certainly wouldn't want the gradient to be calculated of the actual step we're
doing. We only want the gradient to be calculated of our function, f. All right, so when we
step the weights we have to use this special .data attribute. After we do that, delete
the gradients that we already had and let's see if loss improved. So the loss before was
25800, now it's 5,400. And the plot has gone from something that goes down to -300 to something
that looks much better. So let's do that a few times. So I just grabbed those previous
lines of code and pasted them all into a single cell. Okay so preds, loss.backward, data grad
= none. And then from time-to-time print the loss out, and repeat that ten times. And look
getting better and better. And so we can actually look at it getting better and better. So this
is pretty cool, right. We have a technique, this is the Arthur Samuel technique for finding
a set of parameters that continuously improves by getting feedback from the result of measuring
some loss function. So that was kind of the key step, right. This, this is the gradient
descent method. So you should make sure that you kind of go back and feel super comfortable
with what's happened. And you know, if you're not feeling comfortable, that that's fine,
right. If it's been a while, or if you've never done this kind of gradient descent before,
this might feel super unfamiliar. So kind of try to find the first cell in this notebook
where you don't fully understand what it's doing, and then stop and figure it out. Look
at everything that's going on, do some experiments, do some reading until you understand that
cell where you're stuck before you move forwards. So let's now apply this to MNIST. So for MNIST
we want to use this exact technique and there's basically nothing extra we have to do. Except
one thing - we need a loss function. And the metric that we've been using is the error
rate, or the accuracy. It's like how often are we correct, right. And and that's the
thing that we're actually trying to make good, our metric. But we've got a very serious problem
- which is, remember we need to calculate the gradient to figure out how we should change
our parameters. And the gradient is the slope or the steepness, which you might remember
from school is defined as rise over run. It's (y_new - y_old) divided by (x_new - x_old).
So the gradients actually defined when x_new is is very very close to x_old, meaning their
difference is very small. That, think about it - accuracy. If I change a parameter by
a tiny tiny tiny amount, the accuracy might not change at all because there might not
be any 3 that we now predict as a 7 or any 7 that we now predict as a 3, because we change
the parameter by such a small amount. So it's it's it's possible, in facr, it's certain,
that the gradient is zero at many places and that means that our parameters aren't going
to change at all. Because learning rate times gradient is still zero when the gradient’s
zero for any learning rate. So this is why the loss function and the metric are not always
the same thing. We can't use a metric as our loss if that metric has a gradient of zero.
So we need something different. So, we want to find something that kind of is pretty similar
to the accuracy in that like as the accuracy gets better this ideal function we want gets better as
well but it should not have a gradient of zero. So let's think about that function.
Suppose we had three images. Actually, you know what? This is actually probably a good
time to stop. Because actually, you know, we've we've kind of, we've got to the point
here where we understand gradient descent. We kind of know how to do it with a simple
loss function and I actually think before we start lookin g at the MNIST loss function, we shouldn't
move on. Because we've got so much so much assignments to do for this week already. So,
we've got built your web application, and we've got both step-through-step through this
notebook to make sure you fully understand it. So I actually think we should probably
stop right here before we make things too crazy. So before I do, Rachel, are there any
questions? Okay great, all right. Well thanks everybody. Sorry for that last-minute change
of tack there but I think this is going to make sense. So I hope you have a lot of fun
with your web applications. Try and think of something that's really fun, really interesting.
It doesn't have to be like, important. It could just be some you know cute thing. We've
had students before, a student that I think he said he had 16 different cousins, and he
created something that would classify a photo based on which of his cousins... It was for,
like his fiancee meeting his family. [laughs] You know you can come up with anything you
like, but you know, yeah, show off your application and maybe have a look around at what ipywidgets
can do, and try and come up with something that you think is pretty cool. All right,
thanks everybody. I will see you next week!