Lesson 1 - Deep Learning for Coders (2020)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So hello everybody and welcome to Deep Learning for Coders, Lesson One. This is the fourth year that we've done this, but it's a very different and very special version for a number of reasons. The first reason it's different is because we are bringing it to you live from day number one of a complete shutdown. Oh, not a complete shutdown, but nearly a complete shutdown of San Francisco. We're going to be recording it over the next two months in the midst of this global pandemic. So if things seem a little crazy sometimes in this course, I apologize. So that's why this is happening. The other reason it's special is because it's, we're trying to make this our definitive version, right. Since we've been doing this for a while now, we've finally got to the point where we almost feel like we know what we're talking about. To the point that Sylvain and I have actually written a book and we've actually written a piece of software from scratch, called the fastai library version 2. We've written a peer-reviewed paper about this library. So this is kind of designed to be like the version of the course that is hopefully going to last a while. The syllabus is based very closely on this book, right. So if you want to read along properly as you go, please buy it. And I say “please buy it” because actually the whole thing is also available for free in the form of Jupyter notebooks. And that is thanks to the huge generosity of O'Reilly Media, who have let us do that. So you'll be able to see on the website for the course how to kind of access all this, but here is the fastbook repo where you can read the whole damn thing. At the moment as you see, it's a draft, but by the time you read this, it won't be. So we have a big request here which is - the deal is this - you can read this thing for free as Jupyter notebooks, but that is not as convenient as reading it on a Kindle or in a paper book or whatever. So, please don't turn this into a PDF, right. Please don't turn it into a form designed more for reading, because kind of the whole point is that you'll buy it. Don't take advantage of O'Reilly's generosity by creating the thing that you know they're not giving you for free. And that's actually explicitly the license under which we're providing this as well. So it's mainly a request, being a decent human being. If you see somebody else not being a decent human being and stealing the book version of the book, please tell them, “Please don't do that, it's not nice.” And don't be that person. So either way, you can read along with the syllabus in the book. There's a couple of different versions of these notebooks, right. There is the, there's the full notebook that has the entire prose, pictures, everything. Now we actually wrote a system to turn notebooks into a printed book and sometimes that looks kind of weird. For example, here's a weird looking table and if you look in the actual book, it actually looks like a proper table, right. So sometimes you'll see like little weird bits, okay, they are not mistakes they are bits where we can add information to help our book turn into a proper nice book so just just ignore them. Now when I say we, who is we? While I mentioned one important part of the we is Sylvain. Sylvain is my co-author of the book and fastai version 2 library, so he is my partner in crime here. The other key “we” here is Rachel Thomas and so maybe Rachel you can come and say hello. She is the co-founder of fastai. Hello yes I am the co-founder of fastai and I am also, lower sorry taller than Jeremy, and I am the founding director of the Center for Applied Data Ethics at the University of San Francisco. Really excited to be a part of this course and I will be the voice you hear asking questions from the forums. Rachel and Sylvain are also the people in this group who actually understand math. I am a mere philosophy graduate. Rachel has a PhD. Sylvain has written 10 books about math so if the math questions come along it's possible I may pass them along. But it is very nice to have an opportunity to work with people who understand this topic so well. Yes Yes Rachael, did you wanna sure oh thank you. As Rachel mentioned the other area where she is you know real, has real world-class expertise is data ethics, she is the founding director of the Centre for Applied Data Ethics, at the University of San Francisco. Thank you. We are gonna be talking about data ethics throughout the course because well we happen to think it's very important and so for those parts, although I'll generally be presenting them they will be on the whole based on Rachel's Rachel's work because she actually knows what she's talking about. Although thanks to her I kind of know a bit about what I am talking about too. Right, so that's that. So should you be here, is there any point in you attempting to understand (I thought I pressed the right button), understand deep learning. Ok so what do you know should you should you be here. Is there any point you attempting to learn deep learning or are you too stupid or you don't have enough fast resources or whatever, because that's what a lot of people are telling us. They are saying you need teams of PhDs and massive data centers full of GPUs, otherwise it's pointless. Don't worry that is not at all true, couldn't be further from the truth. In fact that the vast majority, sorry, a lot of world-class research and world-class industry projects have come out of fastai alumni and fastai library-based projects and and elsewhere which are created on a single GPU using a few dozen or a few hundred data points from people that have no graduate level technical expertise, or in my case, I have no undergraduate level technical expertise. I'm just a philosophy major. So there is - and we'll see it throughout the course -- but there is lots and lots and lots of clear empirical evidence that you don't need lots of math, you don't need lots of data, you don't need lots of expensive computers to do great stuff, with deep learning. So just bear with us. You'll be fine. To do this course, you do need to code. Preferably, you know how to code in Python. But if you've done other languages, you can learn Python. If the only languages you've done is something like Matlab, where you've used it more like a scripty kind of thing, you might find it a bit - You will find it a bit heavier going. But that's okay, stick with it. You can learn Python as you go. Is there any point learning deep learning? Is it any good at stuff? If you are hoping to build a brain, that is an AGI, I cannot promise we're gonna help you with that. And AGI stands for: Artificial General Intelligence. Thank you. What I can tell you though, is that in all of these areas, deep learning is the best-known approach, to at least many versions of all of these things. So it is not speculative at this point whether this is a useful tool. It's a useful tool in lots and lots and lots of places. Extremely useful tool. And in many of these cases, it is equivalent to or better than human performance. At least according to some particular narrow definition of things that humans do in these kinds of areas. So deep learning is pretty amazing. And if you want to pause the video here and have a look through and try and pick some things out that you think might look interesting and type that keyword and “deep learning” into Google. And you'll find lots of papers and examples and stuff like that. Deep learning comes from a background of neural networks. As you'll see deep learning is just a type of neural network learning. A deep one. We'll describe exactly what that means later. And neural networks are certainly not a new thing. They go back at least to 1943, when McCulloch and Pitts created a mathematical model of an artificial neuron. And got very excited about where that could get to. And then in the 50's, Frank Rosenblatt then built on top of that - he basically created some subtle changes to that mathematical model. And he thought that with these subtle changes “we could witness the birth of the machine that is capable of perceiving, recognizing and identifying its surroundings without any human training or control”. And he oversaw the building of this - extraordinary thing. The Mark 1 Perceptron at Cornell. So that was I think, this picture was 1961. Thankfully nowadays, we don't have to build neural networks by running the damn wires from neuron to neuron (artificial neuron to artificial neuron). But you can kind of see the idea; lot of connections going on. And you'll hear the word connection a lot, in this course, because that's what it's all about. Then we had the first AI winter, as it was known. Which really, to a strong degree happened because an MIT professor named Marvin Minsky and Papert wrote a book called perceptrons about Rosenblatt's invention in which they pointed out that a single layer of these artificial neuron devices, actually couldn't learn some critical things. It was like impossible for them to learn something as simple as the Boolean XOR operator. In the same book, they showed that using multiple layers of the devices actually would fix the problem. People ignore - didn't notice that part of the book. And only noticed the limitation and people basically decided that neural networks are gonna go nowhere. And they kind of largely disappeared, for decades. Until, in some ways, 1986. A lot happened in the meantime but there was a big thing in 1986, which is: MIT released a thing called a book, a series of two volumes of book called, “Parallel Distributed Processing”. In which they described this thing they call parallel distributed processing, where you have a bunch of processing units, that have some state of activation and some output function and some pattern of connectivity and some propagation rule and some activation rule and some learning rule, operating in an environment. And then they described how things that met these requirements, could in theory, do all kinds of amazing work. And this was the result of many, many researchers working together. There was a whole group involved in this project, which resulted in this very, very important book. And so, the interesting thing to me, is that if you - as you go through this course come back and have a look at this picture and you'll see we are doing exactly these things. Everything we're learning about really is how do you do each of these eight things? That is interesting that they include the environment because that's something which very often, data scientists ignore. Which is - you build a model, you've trained it, it's learned something. What's the context it works in? And we're talking about that, quite a bit over the next couple of lessons as well. So in the 80's, during and after this was released people started building in this second layer of neurons, avoiding Minsky's problem. And in fact, it was shown, that it was mathematically provable, that by adding that one extra layer of neurons, it was enough to allow any mathematical model to be approximated to any level of accuracy, with these neural networks. And so that was like the exact opposite of the Minsky thing. That was like: “Hey there's nothing we can't do. Provably there's nothing we can't do.” And so that was kind of when I started getting involved in neural networks. So I was - a little bit later. I guess I was getting involved in the early 90s. And they were very widely used in industry. I was using them for very boring things like targeted marketing for retail banks. They tended to be big companies with lots of money that were using them. And it certainly though was true that often the networks were too big or slow to be useful. They were certainly useful for some things, but they - you know they never felt to me like they were living up to the promise for some reason. Now what I didn't know, and nobody I personally met knew was that actually there were researchers that had shown 30 years ago that to get practical good performance, you need more layers of neurons. Even though mathematically, theoretically, you can get as accurate as you want with just one extra layer. To do it with good performance, you need more layers. So when you add more layers to a neural network you get deep learning. So deep doesn't mean anything like mystical. It just means more layers. More layers than just adding the one extra one. So thanks to that, neural nets are now living up to their potential. As we saw in that like what's deep learning good at thing. So we could now say that Rosenblatt was right. We have a machine that's capable of perceiving, recognising and identifying its surroundings without any human training or control. That is - That's definitely true. I don't think there's anything controversial about that statement based on the current technology. So we're gonna be learning how to do that. We're gonna be learning how to do that in exactly the opposite way, of probably all of the other math and technical education you've had. We are not gonna start with a two-hour lesson about the sigmoid function or a study of linear algebra or a refresher course on calculus. And the reason for that, is that people who study how to teach and learn have found that is not the right way to do it for most people. For most people - So we work a lot based on the work of Professor David Perkins from Harvard and others who work at similar things, who talk about this idea of playing the whole game. And so playing the whole game is like it's based on the sports analogy if you're gonna teach somebody baseball. You don't take them out into a classroom and start teaching them about the physics of a parabola, and how to stitch a ball, and a three-part history of 100 years of baseball politics, and then 10 years later, you let them watch a game. And then 20 years later, you let them play a game. Which is kind of like how math education is being done, right? Instead with baseball, step one is to say, hey, let's go and watch some baseball. What do you think? That was fun, right? See that guy there...he took a run there…before the other guy throws a ball over there...hey, you want to try having a hit? Okay, so you're going to hit the ball, and then I have to try to catch it, then he has to run,run,run over there...and so from step one, you are playing the whole game. And just to add to that, when people start, they often may not have a full team, or be playing the full nine innings, but they still have a sense of what the game is, a kind of a big picture idea. So, there is lots and lots of reasons that this helps most human beings, (though) not everybody, right? There's a small percentage of people who like to build things up from the foundations and the principles, and not surprisingly, they are massively overrepresented in a university setting, because the people who get to be academics are the people who thrive with, (according) to me, the upside down way of how things are taught. But outside of universities most people learn best in this top-down way, where you start with the full context. So step number two in the seven principles, and I'm only going to mention the first three, is to make the game worth playing. Which is like, if you're playing baseball, you have a competition. You know, you score, you try and win, you bring together teams from around the community and you have people try to beat each other. And you have leaderboards, like who's got the highest number of runs or whatever. So this is all about making sure that the thing you're doing, you're doing it properly. You're making it the whole thing, you're providing the context and the interest. So, for the fastai approach to learning deep learning, what this means is that today we're going to train models end to end. We're going to actually train models, and they won't just be crappy models. They will be state-of-the-art world-class models from today, and we can try to have you build your own state-of-the-art world-class models from either today or next lesson, depending on how things go. Then, number three in the seven principles from Harvard is, work on the hard parts. Which is kind of like this idea of practice, deliberate practice. Work on the hard parts means that you don't just swing a bat at a ball every time, you know, you go out and just muck around. You train properly, you find the bit that you are the least good at, you figure out where the problems are, you work damn hard at it. So, in the deep learning context, that means that we do not dumb things down. Right? By the end of the course, you will have done the calculus. You will have done the linear algebra. You will have done the software engineering of the code, right? You will be practicing these things which are hard, so it requires tenacity and commitment. But hopefully, you'll understand why it matters because before you start practicing something you'll know why you need that thing because you'll be using it. Like to make your model better, you'll have to understand that concept first. So for those of you used to a traditional university environment, this is gonna feel pretty weird and a lot of people say: “that they regret (you know after a year of studying fastai) that they spent too much time studying theory, and not enough time training models and writing code. That's the kind of like, the number one piece of feedback we get from people who say, “I wish I've done things differently.” It's that. So please try to, as best as you can, since you're here, follow along with this approach. We are gonna be using a software stack - Sorry Rachel. Yes? I just need to say one more thing about the approach. I think since, so many of us spent so many years with the traditional educational approach of bottom-up, that this can feel very uncomfortable at first. I still feel uncomfortable with it sometimes, even though I'm committed to the idea. And that, some of it is also having to catch yourself and being okay with not knowing the details. Which I think can feel very unfamiliar, or even wrong when you're kind of new to that. Of like: “Oh wait, I'm using something and I don't understand every underlying detail.” But you kind of have to trust that we're gonna get to those details later. So I can't empathise because I did not spend lots of time doing that. But I will tell you this - teaching this way is very, very, very hard. And I very often find myself jumping back into a foundations first approach. Because it's just so easy to be like: “Oh you need to know this. You need to know this. You need to do this. And then you can know this.” That's so much easier to teach. So I do find this much much more challenging to teach, but hopefully it's worth it. We spent a long long time figuring out how to get deep learning into this format. But one of the things that helps us here, is the software we have available. If you haven't used Python before - it's ridiculously flexible and expressive and easy-to-use language. We have plenty of bits about it we don't love but on the whole we love the overall thing. And we think it's - Most importantly, the vast, vast, vast majority of deep learning practitioners and researchers are using Python. On top of Python, there are two libraries that most folks are using today; PyTorch and TensorFlow. There's been a very rapid change here. TensorFlow was what we were teaching until a couple of years ago. It's what everyone was using until a couple of years ago. It got super bogged down, basically TensorFlow got super bogged down. This other software called PyTorch came along that was much easier to use and much more useful for researchers and within the last 12 months, the percentage of papers at major conferences that uses PyTorch has gone from 20% to 80% and vice versa, those that use TensorFlow have gone on from 80% to 20%. So basically all the folks that are actually building the technology were all using, are now using, PyTorch and you know industry moves a bit more slowly but in the next year or two you will probably see a similar thing in industry. Now, the thing about PyTorch is it's super super flexible and really is designed for flexibility and developer friendliness, certainly not designed for beginner friendliness and it's not designed for what we say, it doesn't have higher level API's, by which I mean there isn't really things to make it easy to build stuff quickly using PyTorch. So to deal with that issue, we have a library called fastai that sits on top of PyTorch. Fastai is the most popular higher level API for PyTorch. It is, because our courses are so popular, some people are under the mistaken impression that fastai is designed for beginners or for teaching. It is designed for beginners and teaching, as well as practitioners in industry and researchers. The way we do this makes sure that it's, that it's the best API for all of those people as we use something called a layered API and so there's a peer-reviewed paper that Sylvain and I wrote that described how we did that and for those of you that are software engineers, it will not be at all unusual or surprising. It's just totally standard software engineering practices, but they are practices that were not followed in any deep learning library we had seen. Just basically lots of re-factoring and decoupling and so by using that approach, it's allowed us to build something which you can do super low-level research, you can do state-of-the-art production models and you can do kind of super easy, beginner, but beginner world-class models. So that's the basic software stack, there's other pieces of software we will be learning about along the way. But the main thing I think to mention here is it actually doesn't matter. If you learn this software stack and then at work you need to use TensorFlow and Keras, you will be able to switch in less than a week. Lots and lots of students have done that, it's never been a problem. The important thing is to learn the concepts and so we're going to focus on those concepts and by using an API which minimizes the amount of boilerplate you have to use, it means you can focus on the bits that are important. The actual lines of code will correspond much more to the actual concepts you are implementing. You are going to need a GPU machine. A GPU is a Graphics Processing Unit, and specifically, you need an Nvidia GPU. Other brands of GPU just aren't well supported by any Deep Learning libraries. Please don't buy one. If you already have one you probably shouldn't use it. Instead you should use one of the platforms that we have already got set up for you. It's just a huge distraction to be spending your time doing, like, system administration on a GPU machine and installing drivers and blah blah blah. And run it on Linux. Please. That's what everybody's doing, not just us, everybody's running it on Linux. Make life easy for yourself. It's hard enough to learn Deep Learning without having to do it in a way that you are learning, you know, all kinds of arcane hardware support issues. There's a lot of free options available and so, please, please use them. If you're using an option that is not free don't forget to shut down your instance. So what's gonna be happening is you gonna be spinning up a server that lives somewhere else in the world, and you're gonna be connecting to it from your computer and training and running and building models. Just because you close your browser window doesn't mean your server stops running on the whole. Right? So don't forget to shut it down because otherwise you're paying for it. Colab, is a great system which is free. There's also a paid subscription version of it. Be careful with Colab. Most of the other systems we recommend save your work for you automatically and you can come back to it at any time. Colab doesn't. So be sure to check out the Colab platform thread on the forums, to learn about that. So, I mention the forums... The forums are really, really important because that is where all of the discussion and set up and everything happens. So for example if you want help with setup here. You know there is a setup help thread and you can find out, you know, how to best set up Colab, and you can see discussions about it and you can ask questions, and please remember to search before you ask your question, right? Because it's probably been asked before, unless you're one of the very, very earliest people who are doing the course. So, once you… So, step one is to get your server set up by just following the instructions from the forums or from the course website. And the course website will have lots of step-by-step instructions for each platform. They will vary in price, they will vary in speed, they will vary in availability, and so forth. Once you are finished following those instructions The last step of those instructions will end up showing you something like this: a course v4 folder, so a version four of our course. By the time you see this video, this is likely to have more stuff in it, but it will have an NB's standing for notebooks folder. So you can click on that, and that will show you all of the notebooks for the course. What I want you to do is scroll bottom and find the one called app Jupyter. Click on that, and this is where you can start learning about Jupyter notebook. What's Jupyter Notebook? Jupyter Notebook is something where you can start typing things, and press Shift-Enter, and it will give you an answer. And so the thing you're typing is python code, and the thing that comes out is a result of that code. And so you can put in anything in python. X equals three times four. X plus one, and as you can see, it displays a result anytime there's a result to display. So for those of you that have done a bit of coding before, you will recognise this as a REPL. R-E-P-L, read, evaluate, print, loop. Most languages have some kind of REPL. The Jupyter notebook REPL is particularly interesting, because it has things like headings, graphical outputs, interactive multimedia. It's a really astonishing piece of software. It's won some really big awards. I would have thought the most widely used REPL, outside of shells like bash. It's a very powerful system. We love it. We've written our whole book in it, we've written the entire FASTAI library with it, we do all our teaching with it. It's extremely unfamiliar to people who have done most of their work in IDE. You should expect it to feel as awkward as perhaps the first time you moved from a GUI to a command line. It's different. So if you're not familiar with REPL-based systems, it's gonna feel super weird. But stick with it, because it really is great. The kind of model going on here is that, this webpage I'm looking at, is letting me type in things for a server to do, and show me the results of computations a server is doing. So the server is off somewhere else. It's not running on my computer right? The only thing running on the computer is this webpage. But as I do things, so for example if I say X equals X times three. This is updating the servers state. There's this state. It's like what's currently the value of X and so I can find out, now X is something different. So you can see, when I did this line here, it didn't change the earlier X plus one, right? So that means that when you look at a Jupyter notebook, it's not showing you the current state of your server. It's just showing you what that state was, at the time that you printed that thing out. It's just like if you use a shell like bash. And you type “ls”. And then you delete a file. That earlier “ls” you printed doesn't go back and change. That's kind of how REPLs generally work. Including this one. Jupyter notebook has two modes. One is edit mode, which is when I click on a cell and I get a flashing cursor and I can move left and right and type. Right? There's not very many keyboard shortcuts to this mode. One useful one is “control” or “command” + “/”. Which will comment and uncomment. The main one to know is “shift” + “enter” to actually run the cell. At that point there is no flashing cursor anymore. And that means that I'm now in command mode. Not edit mode. So as I go up and down, I'm selecting different cells. So in command mode as we move around, we're now selecting cells. And there are now lots of keyboard shortcuts you can use. So if you hit “H” you can get a list of them. That, for example - And you'll see that they're not on the whole, like “control” or “command” with something they're just the letter on its own. So if you use like Vim, you'll be more familiar with this idea. So for example if I hit “C” to copy and “V” to paste. Then it copies the cell. Or “X” to cut it. “A” to add a new cell above. And then I can press the various number keys, to create a heading. So number two will create a heading level II. And as you can see, I can actually type formatted text not just code. The formatted text I type is in Markdown. Like so. My numbered one work. There you go. So that's in Markdown. If you haven't used Markdown before, it's a super super useful way to write formatted text. That is used very, very, very widely. So learn it because it's super handy. And you need it for Jupiter. So when you look at our book notebooks. For example, you can see an example of all the kinds of formatting and code and stuff here. So you should go ahead and go through the “app_jupyter”. And you can see here how you can create plots for example. And create lists of things. And import libraries. And display pictures and so forth. If you wanna create a new notebook, you can just go “New” “Python 3” and that creates a new notebook. Which by default is just called “Untitled” so you can then rename it to give it whatever name you like. And so then you'll now see that, in the list here, “newname”. The other thing to know about Jupiter is that it's a nice easy way to jump into a terminal if you know how to use a terminal. You certainly don't have to for this course at least for the first bit. If I go to a new terminal, You can see here I have a terminal. One thing to note is for the notebooks are attached to a Github repository. If you haven't used Github before that's fine. but basically they're attached to a server where from time to time we will update the notebooks on it. And we will see you'll see on the course website, in the forum we tell you how to make sure you have the most recent versions. When you grab our most recent version you don't want to conflict with or overwrite your changes. So as you start experimenting it's not a bad idea to like select a notebook and click duplicate and then start doing your work in the copy. And that way when you get an update of our latest course materials, it's not gonna interfere with the experiments you've been running. So there are two important repositories to know about. One is the fast book repository which we saw earlier, which is kind of the full book with all the outputs and pros and everything. And then the other one is the course V4 repository. And here is the exact same notebook from the course V4 repository. And for this one we remove all of the pros and all of the pictures and all of the outputs and just leave behind the headings and the code. In this case you can see some outputs because I just ran that code, most of it. There won't be any. No, No, I guess we have left outputs. I'm not sure to keep that or not. So you may or may not see the outputs. So the idea with this is. This is properly the version that you want to be experimenting with. Because it kind of forces you to think about like what's going on as you do each step, rather than just reading it and running it without thinking. We kind of want you to do it in a small bare environment in which you thinking about like what did the book say why was this happening and if you forget anything then you kind of go back to the book. The other thing to mention is both the course V4 version and the fast book version at the end have a questionnaire. And a quite a few folks have told us you know that in amongst the reviewers and stuff that they actually read the questionnaire first. We spent many, many weeks writing the questionnaires, Sylvain and I. And the reason for that is because we try to think about like what do we want you to take away from each notebook. So you kind of read the questionnaire first. You can find out what are the things we think are important. What are the things you should know before you move on. So rather than having like a summary section at the end saying at the end of this you should know, blah blah blah, we instead have a questionnaire to do the same thing, so please make sure you do the questionnaire before you move onto the next chapter. You don't have to get everything right, and most of the time answering the questions is as simple as going back to that part of the notebook and reading the prose, but if you've missed something, like do go back and read it because these are the things we are assuming you know. So if you don't know these things before you move on, it could get frustrating. Having said that, if you get stuck after trying a couple of times, do move onto the next chapter, do two or three more chapters and then come back. maybe by the time you've done a couple more chapters, you know, you will get some more perspective. We try to re-explain things multiple times in different ways, so it's okay if you tried and you get stuck, then you can try moving on. Alright, so, let's try running the first part of the notebook. So here we are in 01 intro, so this is chapter 1 and here is our first cell. So I click on the cell and by default, actually, there will be a header in the toolbar as you can see. You can turn them on or off. I always leave them off myself and so to run this cell, you can either click on the play, the run button or as I mentioned, you can hit shift enter. So for this one this i'll just click and as you can see this star appears, so this says I'm running and now you can see this progress bar popping up and that is going to take a few seconds and so as it runs to print out some results. Don't expect to get exactly the same results as us, there is some randomness involved in training a model, and that's okay. Don't expect to get exactly the same time as us. If this first cell takes more than five minutes unless you have a really old GPU that is probably a bad sign. You might want to hop on the forums and figure out what's going wrong or maybe it only has windows which really doesn't work very well for this moment. Don't worry that we don't know what all the code does yet. We are just making sure that we can train a model . So here we are, it's finished running and so as you can see, it's printed out some information and in this case it's showing me that there is an error rate of 0.005 at doing something. What is the something it's doing? Well, what it's doing here is it's actually grabbing a dataset, we call the pets dataset, which is a dataset of pictures of cats and dogs. And it's trying to figure out; which ones are cats and which ones are dogs. And as you can see, after about well less than a minute, it's able to do that with a 0.5% error rate. So it can do it pretty much perfectly. So we've trained our first model. We have no idea how. We don't know what we were doing. But we have indeed trained our model. So that's a good start. And as you can see, we can train models pretty quickly on a single computer. Which you know - Many of which you can get for free. One more thing to mention is, if you have a Mac - doesn't matter whether you have Windows or Mac or Linux in terms of what's running in the browser. But if you have a Mac, please don't try to use that GPU. Mac's actually - Apple doesn't even support Nvidia GPUs anymore. So that's really not gonna be a great option. So stick with Linux. It will make life much easier for you. Right, actually the first thing we should do is actually try it out. So if - I claim we've trained a model that can pick cats from dogs. Let's make sure we can. So let's - Check out this cell. This is interesting. Right? We've created a widgets dot file upload object and displayed it. And this is actually showing us a clickable button. So as I mentioned this is an unusual REPL. We can even create GUIs, in this REPL. So if I click on this file upload. And I can pick “cat”. There we go. And I can now turn that uploaded data into an image. There's a cat. And now I can do predict, and it's a cat. With a 99.96% probability. So we can see we have just uploaded an image that we've picked out. So you should try this. Right? Grab a picture of a cat. Find one from the Internet or go and take a picture of one yourself. And make sure that you get a picture of a cat. This is something which can recognise photos of cats, not line drawings of cats. And so as we'll see, in this course. These kinds of models can only learn from the kinds of information you give it. And so far we've only given it, as you'll discover, photos of cats. Not anime cats, not drawn cats, not abstract representations of cats but just photos. So we're now gonna look at; what's actually happened here? And you'll see at the moment, I am not getting some great information here. If you see this, in your notebooks, you'll have to go: file, trust notebook. And that just tells Jupiter that it's allowed to run the code necessary to display things, to make sure there isn't any security problems. And so you'll now see the outputs. Sometimes you'll actually see some weird code like this. This is code that actually creates outputs. So sometimes we hide that code. Sometimes we show it. So generally speaking, you can just ignore the stuff like that and focus on what comes out. So I'm not gonna go through these. Instead I'm gonna have a look at it - same thing over here on the slides. So what we're doing here is; we're doing machine learning. Deep learning is a kind of machine learning. What is machine learning? Machine learning is, just like regular programming, it's a way to get computers to do something. But in this case, it's pretty hard to understand how you would use regular programming to recognise dog photos from cat photos. How do you kind of create the loops and the variable assignments and the conditionals to create a program that recognises dogs vs cats in photos. It's super hard. Super super hard. So hard, that until kind of the deep learning era, nobody really had a model that was remotely accurate at this apparently easy task. Because we can't write down the steps necessary. So normally, you know, we write down a function that takes some inputs and goes through our program. Produces some results. So this general idea where the program is something that we write (the steps). Doesn't seem to work great for things like recognising pictures. So back in 1949, somebody named Arthur Samuel started trying to figure out a way to solve problems like recognising pictures of cats and dogs. And in 1962, he described a way of doing this. Well first of all he described the problem: “Programming a computer for these kinds of computations is at best a difficult task. Because of the need to spell out every minute step of the process in exasperating detail. Computers are giant morons which all of us coders totally recognise.” So he said, okay, let's not tell the computer the exact steps, but let's give it examples of a problem to solve and figure out how to solve it itself. And so, by 1961 he had built a checkers program that had beaten the Connecticut state champion, not by telling it the steps to take to play checkers, but instead by doing this, which is: “arrange for an automatic means of testing the effectiveness of a weight assignment in terms of actual performance and a mechanism for altering the weight assignment so as to maximise performance.” This sentence is the key thing. And it's a pretty tricky sentence so you can spend some time on it. The basic idea is this; instead of saying inputs to a program and then outputs. Let's have inputs to a - let's call the program now model. It is the same basic idea. Inputs to a model and results. And then we're gonna have a second thing called weights. And so the basic idea is that this model is something that creates outputs based not only on, for example, the state of a checkers board, but also based on some set of weights or parameters that describe how that model is going to work. So the idea is, if we could, like, enumerate all the possible ways of playing checkers, and then kind of describe each of those ways using some set of parameters or what Samuel called weights. Then if we had a way of checking how effective a current weight assignment is in terms of actual performance, in other words, does that particular enumeration of a strategy for playing checkers end up winning or losing games, and then a way to alter the weight assignment so as to maximise the performance. So then oh let's try increasing or decreasing each one of those weights one at a time to find out if there is a slightly better way of playing checkers and then do that lots of lots of times then eventually such a procedure could be made entirely automatic and then the machine so programmed would learn from its experience so this little paragraph is, is the thing. This is machine learning a way of creating programs such that they learn, rather than programmed. So if we had such a thing, then we would basically now have something that looks like this: you have inputs and weights again going into a model, creating results, i.e. you won or you lost, and then a measurement of performance. So remember that was this key step and then the second key step is a way to update the weights based on the measured performance and then you could look through this process and create a) train a machine learning model so this is the abstract idea. So after it ran for a while, right, it's come up with a set of weights which it's pretty good, right, we can now forget the way it was trained and we have something that is just like this, right, except the word program is now replaced with the word model. So a trained model can be used just like any other computer program. So the idea is we are building a computer program not by putting up the steps necessary to do the task, but by training it to learn to do the task at the end of which it's just another program and so this is what's called inference right is using a trained model as a program to do a task such as playing checkers so machine learning is training programs developed by allowing a computer to learn from its experience rather than through manually coding. Ok how would you do this for image recognition, what is that model and that set of weights such that as we vary them it could get better and better at recognising cats versus dogs, I mean for checkers It's not too hard to imagine how you could kind of enumerate, depending on different kinds of “how far away the opponent's piece is from your piece,” what should you do in that situation. How should you weigh defensive versus aggressive strategies, blah blah blah. Not at all obvious how you do that for image recognition. So what we really want, is some function in here which is so flexible that there is a set of weights that could cause it to do anything. A real--like the world's most flexible possible function--and turns out that there is such a thing. It's a neural network. So we'll be describing exactly what that mathematical function is in the coming lessons. To use it, it actually doesn't really matter what the mathematical function is. It's a function which is, we say, “parameterised” by some set of weights by which I mean, as I give it a different set of weights it does a different task, and it can actually do any possible task: something called the universal approximation theorem tells us that mathematically provably, this functional form can solve any problem that is solvable to any level of accuracy. If you just find the right set of weights. Which is kind of restating what we described earlier in that, like, how do we deal with Minsky (the Marvin Minsky) problem so neural networks are so flexible that if you could find the right set of weights they can solve any problem including “Is this a car or is this dog.” So that means you need to focus your effort on the process of training that is finding good weights, good weight assignments to use Samuel's terminology. So how do you do that? We want a completely general way to do this--to update the weights based on some measure of performance, such as how good is it at recognising cats versus dogs. And luckily it turns out such a thing exists! And that thing is called stochastic gradient descent (or SGD). Again, we'll look at exactly how it works, we'll build it ourselves from scratch, but for now we don't have to worry about it. I will tell you this, though, neither SGD nor neural nets are at all mathematically complex. They nearly entirely are addition and multiplication. The trick is it just a lot of them--like billions of them--so many more than we can intuitively grasp. They can do extraordinarily powerful things, but they're not rocket science at all. They are not complex things, and we will see exactly how they work. So that's the Arthur Samuel version, right? Nowadays we don't use quite the same terminology, but we use exactly the same idea. So that function that sits in the middle, we call an architecture. An architecture is the function that we're adjusting the weights to get it to do something. That's the architecture, that's the functional form of the model. Sometimes people say model to mean architecture, so don't let that confuse you too much. But, really the right word is architecture. We don't call them weights; we call them parameters. Weights has a specific meaning- it's quite a particular kind of parameter. The things that come out of the model, the architecture with the parameters, we call them predictions. The predictions are based on two kinds of inputs: independent variables that's the data, like the pictures of the cats and dogs, and dependent variables also known as labels, which is like the thing saying “this is a cat”, “this is a dog”, “this is a cat”. So, that's your inputs. So, the results are predictions. The measure of performance, to use Arthur Samuel's word, is known as the loss. So, the loss is being calculated from the labels on the predictions and then there's the update back to the parameters. Okay, so, this is the same picture as we saw, but just putting in the words that we use today. So, this picture- if you forget, if I say these are the parameters of this used for this architecture to create a model- you can go back and remind yourself what they mean. What are the parameters? What are the predictions? What is the loss? Okay, the loss of some function that measures the performance of the model in such a way that we can update the parameters. So, it's important to note that deep learning and machine learning are not magic, right? The model can only be created where you have data showing you examples of the thing that you're trying to learn about. It can only learn to operate on the patterns that you've seen in the input used to train it, right? So, if we don't have any line drawings of cats and dogs, then there's never going to be an update to the parameters that makes the architecture and so the architect and the parameters together is the model. So, to say the model, that makes the model better at predicting line drawings of cats and dogs because they just, they never received those weight updates because they never received those inputs. Notice also that this learning approach only ever creates predictions. It doesn't tell you what to do about it. That's going to be very important when we think about things like a recommendation system of like “what product do we recommend to somebody”? Well, I don't know- we don't do that, right? We can predict what somebody will say about a product we've shown them, but we're not creating actions. We're creating predictions. That's a super important difference to recognize. It's not enough just to have examples of input data like pictures of dogs and cats. We can't do anything without labels. And so very often, organisations say: “we don't have enough data”. Most of the time they mean: “we don't have enough labelled data”. Because if a company is trying to do something with deep learning, often it's because they're trying to automate or improve something they're already doing. Which means by definition they have data about that thing or a way to capture data about that thing. Cus they're doing it. Right? But often the tricky part is labelling it. So for example in medicine. If you're trying to build a model for radiology. You can almost certainly get lots of medical images about just about anything you can think of. But it might be very hard to label them according to malignancy of a tumour or according to whether or not meningioma is present or whatever, because these kinds of labels are not necessarily captured in a structured way, at least in the US medical system. So that's an important distinction that really impacts your kind of strategy. So then a model, as we saw from the PDP book, a model operates in an environment. Right? You roll it out and you do something with. And so then, this piece of that kind of PDP framework is super important. Right? You have a model that's actually doing something. For example, you've built a predictive policing model that predicts (doesn't recommend actions) it predicts where an arrest might be made. This is something a lot of jurisdictions in the US are using. Now it's predicting that, based on data and based on labelled data. And in this case it's actually gonna be using (in the US) for example data where, I think, depending on whether you're black or white, black people in the US, I think, get arrested something like seven times more often for say marijuana possession than whites. Even though the actual underlying amount of marijuana use is about the same in the two populations. So if you start with biased data and you build a predictive policing model. Its prediction will say: “oh you will find somebody you can arrest here” based on some biased data. So then, law enforcement officers might decide to focus their police activity on the areas where those predictions are happening. As a result of which they'll find more people to arrest. And then they'll use that, to put it back into the model. Which will now find: “oh there's even more people we should be arresting in the black neighbourhoods” and thus it continues. So this would be an example of how a model interacting with its environment creates something called a positive feedback loop. Where the more a model is used, the more biased the data becomes, making the model even more biased and so forth. So one of the things to be super careful about with machine learning is; recognising how that model is actually being used and what kinds of things might happen as a result of that. I was just going to add that this is an example of proxies because here arrest is being used as a proxy for crime, and I think that pretty much in all cases, the data that you actually have is a proxy for some value that you truly care about. And that difference between the proxy and the actual value often ends up being significant. Thanks, Rachel. That's a really important point. Okay, so let's finish off by looking at what's going on with this code. So the code we ran is, basically -- one, two, three, four, five, six -- lines of code. So the first line of code is an import line. So in Python you can't use an external library until you import from it. Normally in place, people import just the functions and classes that they need from the library. But Python does provide a convenient facility where you can import everything from a module, which is by putting a start there. Most of the time, this is a bad idea. Because, by default, the way Python works is that if you say import star, it doesn't only import the things that are interesting and important in the library you're trying to get something from. But it also imports things from all the libraries it used, and all the libraries they used, and you end up kind of exploding your namespace in horrible ways and causing all kinds of bugs. Because fastai is designed to be used in this REPL environment where you want to be able to do a lot of quick rapid prototyping, we actually spent a lot of time figuring out how to avoid that problem so that you can import star safely. So, whether you do this or not, is entirely up to you. But rest assured that if you import star from a fastai library, it's actually been explicitly designed in a way that you only get the bits that you actually need. One thing to mention is in the video you see it's called “fastai2.” That's because we're recording this video using a prerelease version. By the time you are watching the online, the MOOC, version of this, the 2 will be gone. Something else to mention is, there are, as I speak, four main predefined applications in fastai, being vision, text, tabular and collaborative filtering. We'll be learning about all of them and a lot more. For each one, say here's vision, you can import from the .all, kind of meta-model, I guess we could call it. And that will give you all the stuff that you need for most common vision applications. So, if you're using a REPL system like Jupyter notebook, it's going to give you all the stuff right there that you need without having to go back and figure it out. One of the issues with this is a lot of the python users don't. If they look at something like untar_data, they would figure out where it comes from by looking at the import line. And so if you import star, you can't do that anymore. The good news, in a REPL, you don't have to. You can literally just type the symbol, press SHIFT - ENTER and it will tell you exactly where it came from. As you can see. So that's super handy. So in this case, for example, to do the actual building of the dataset, we called ImageDataLoaders.from_name_func. I can actually call the special doc function to get the documentation for that. As you can see, it tells me exactly everything to pass in, what all the defaults are, and most importantly, not only what it does, but SHOW IN DOCS pops me over to the full documentation including an example. Everything in the fastAI documentation has an example and the cool thing is: the entire documentation is written in Jupyter Notebooks. So that means you can actually open the Jupyter Notebook for this documentation and run the line of code yourself and see it actually working and look at the outputs and so forth. Also in the documentation, you'll find that there are a bunch of tutorials. For example, if you look at the vision tutorial, it will cover lots of things but one of the things we will cover is, as you can see in this case, pretty much the same kind of stuff we are actually looking at in Lesson 1. So there is a lot of documentation in fastAI and taking advantage of it is a pretty good idea. It is fully searchable and as I mentioned, perhaps most importantly, every one of these documentation pages is also a fully interactive Jupyter Notebook. So, looking through more of this code, the first line after the import is something that uses untar_ data. That will download a dataset, decompress it, and put it on your computer. If it is already downloaded, it won't download it again. If it is already decompressed it won't decompress it again. And as you can see, fastAI already has predefined access to a number of really useful datasets. such as this PETS dataset. Datasets are a super important part, as you can imagine of deep learning. We will be seeing lots of them. And these are created by lots of heroes (and heroines) who basically spend months or years collating data that we can use to build these models. The next step is to tell fastAI what this data is and we will be learning a lot about that. But in this case, we are basically saying, ‘okay, it contains images'. It contains images that are in this path. So untar_data returns the path that is whereabouts it has been decompressed to. Or if it is already decompressed, it tells us where it was previously decompressed to. We have to tell it things like ‘okay, what images are actually in that path'. One of the really interesting ones is label_func. How do you tell, for each file, whether it is a cat or a dog. And if you actually look at the ReadME for the original dataset, it uses a slightly quirky thing which is they said, ‘oh, anything where the first letter of the filename is an uppercase is a cat'. That's what they decided. So we just created a little function here called is_cat that returns the first letter, is it uppercase or not. And we tell fastai that's how you tell if it's a cat. We'll come back to these two in a moment. So the next thing, now we've told it what the data is. We then have to create something called a learner. A learner is a thing that learns, it does the training. So you have to tell it what data to use. Then you have to tell it what architecture to use. I'll be talking a lot about this in the course. But, basically, there's a lot of predefined neural network architectures that have certain pros and cons. And for computer vision, the architecture is called ResNet. Just a super great starting point, and so we're just going to use a reasonably small one of them. So these are all predefined and set up for you. And then you can tell fastai what things you want to print out as it's training. And in this case, we're saying “oh, tell us the error, please, as you train”. So then we can call this really important method called fine_tune that we'll be learning about in the next lesson which actually does the training. valid_pct does something very important. It grabs, in this case, 20% of the data (.2 proportion), and does not use it for training a model. Instead, it uses it for telling you the error rate of the model. So, always in fastai this metric, error_rate, will always be calculated on a part of the data which has not been trained with. And the idea here, and we'll talk a lot about more about this in future lessons. But the basic idea here is we want to make sure that we're not overfitting. Let me explain. Overfitting looks like this. Let's say you're trying to create a function that fits all these dots, right. A nice function would look like that, right. But you could also fit, you can actually fit it much more precisely with this function. Look, this is going much closer to all the dots than this one is. So, this is obviously a better function. Except, as soon as you get outside where the dots are, especially if you go off the edges, it's obviously doesn't make any sense. So, this is what you'd call an overfit function. So, overfitting happens for all kinds of reasons. We use a model that's too big or we use not enough data. We'll be talking all about it, right. But, really the craft of deep learning is all about creating a model that has a proper fit. And the only way you know if a model has a proper fit is by seeing whether it works well on data that was not used to train it. And so, we always set aside some of the data to create something called a validation set. The validation set is the data that we use not to touch it at all when we're training a model, but we're only using it to figure out whether the model's actually working or not. One thing that Sylvain mentioned in the book, is that one of the interesting things about studying fastai is you learn a lot of interesting programming practices. And so I've been programming, I mean, since I was a kid, so like 40 years. And Sylvain and I both work really, really hard to make python do a lot of work for us and to use, you know, programming practices which make us very productive and allow us to come back to our code, years later and still understand it. And so you'll see in our code we'll often do things that you might not have seen before. And so we, a lot of students who have gone through previous courses say they learned a lot about coding and python coding and software engineering from the course. So, yeah check, when you see something new, check it out and feel free to ask on the forums if you're curious about why something was done that way. One thing to mention is, just like I mentioned import star is something most Python programmers don't do cause most libraries don't support doing it properly. We do a lot of things like that. We do a lot of things where we don't follow a traditional approach to python programming. Because I've used so many languages over the years, I code not in a way that's specifically pythonic, but incorporates like ideas from lots of other languages and lots of other notations and heavily customised our approach to python programming based on what works well for data science. That means that the code you see in fastai is not probably, not gonna fit with, the kind of style guides and normal approaches at your workplace, if you use Python there. So, obviously, you should make sure that you fit in with your organization's programming practices rather than following ours. But perhaps in your own hobby work, you can follow ours and see if you find that are interesting and helpful, or even experiment with that in your company if you're a manager and you are interested in doing so. Okay, so to finish, I'm going to show you something pretty interesting, which is, have a look at this code untar data, image data loaders from name func, learner, fine tune. Untar data, segmentation date loaders, from label func, learner, fine tune. Almost the same code, and this has built a model that does something, whoa totally different! It's something which has taken images. This is on the left, this is the labeled data. It's got images with color codes to tell you whether it's a car, or a tree, or a building, or a sky, or a line marking or a road. And on the right is our model, and our model has successfully figured out for each pixel, is that a car, line marking, a road. Now it's only done it in, under 20 seconds right. So it's a very small quick model. It's made some mistakes -- like it's missing this line marking, and some of these cars it thinks is house, right? But you can see so if you train this for a few minutes, it's nearly perfect. But you can see the basic idea is that we can very rapidly, with almost exactly the same code, create something not that classifies cats and dogs but does what's called segmentation: figures out what every pixel and image is. Look, here's the same thing: from import star text loaders from folder learner learn fine-tune Same basic code. This is now something where we can give it a sentence and it can figure out whether that is expressing a positive or negative sentiment, and this is actually giving a 93% accuracy on that task in about 15 minutes on the IMDb dataset, which contains thousands of full-length movie reviews (in fact, 1000- to 3000-word reviews). This number here that we got with the same three lines of code would have been the best in the world for this task in a very, very, very popular academics dataset in like 2015 I think. So we are creating world-class models, in our browser, using the same basic code. Here's the same basic steps again: from import star untar data tabular data loaders from csv learner fit This is now building a model that is predicting salary based on a csv table containing these columns. So this is tabular data. Here's the same basic steps from import * untar data collab data loaders from csv learner fine-tune This is now building something which predicts, for each combination of a user and a movie, what rating do we think that user will give that movie, based on what other movies they've watched and liked in the past. This is called collaborative filtering and is used in recommendation systems. So here you've seen some examples of each of the four applications in fastai. And as you'll see throughout this course, the same basic code and also the same basic mathematical and software engineering concepts allow us to do vastly different things using the same basic approach. And the reason why is because of Arthur Samuel. Because of this basic description of what it is you can do if only you have a way to parameterize a model and you have an update procedure which can update the weights to make you better at your loss function, and in this case we can use neural networks, which are totally flexible functions. So that's it for this first lesson. It's a little bit shorter than our other lessons going to be and the reason for that is that we are as I mentioned at the start of a global pandemic here, or at least in the West (in other countries they are much further into it). So we spent some time talking about that at the start of the course and you can find that video elsewhere. So in the future lessons there will be more on deep learning. So, what I suggest you do over the next week, before you work on the next lesson, is just make sure that you can spin up a GPU server, you can shut down when it's finished and that you can run all of the code here and, as you go through, see if this is using Python in a way you recognise, use the documentation, use that doc function, do some search on the fastai doc, see what it does, see if you can actually grab the fastai documentation notebooks themselves and run them. Just try to get comfortable, that you can if you can know your way around. Because the most important thing to do with this style of learning, this top-down learning, is to be able to run experiments and that means you need to be able to run code. So my recommendation is: don't move on until you can run the code, read the chapter of the book, and then go through the questionnaire. We still got some more work to do about validation sets and test sets and transfer learning. So you won't be able to do all of it yet but try to to all the parts you can, based on what we've seen of the course so far. Rachel, anything you want to add before we go. Okay, so thanks very much for joining us for lesson one everybody and are really looking forward to seeing you next time where we will learn about transfer learning and then we will move on to creating an actual production version of an application that we can actually put out on the Internet and you can start building apps that you can show your friends and they can start playing with. Bye Everybody
Info
Channel: Jeremy Howard
Views: 228,851
Rating: 4.9617438 out of 5
Keywords: deep learning, fastai
Id: _QUEXsHfsA0
Channel Id: undefined
Length: 82min 31sec (4951 seconds)
Published: Fri Aug 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.