Practical Deep Learning for Coders: Lesson 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Welcome to Practical Deep Learning for coders, lesson one. This is version five of this course, and it's the first new one we've done in two years. So, we've got a lot of cool things to cover! It's amazing how much has changed. Here is an xkcd from the end of 2015. Who here has seen xkcd comics before? …Pretty much everybody. Not surprising. So the basic joke here is… I'll let you read it, and then I'll come back to it. So, it can be hard to tell what's easy and what's nearly impossible, and in 2015 or at the end of 2015 the idea of checking whether something is a photo of a bird was considered nearly impossible. So impossible, it was the basic idea of a joke. Because everybody knows that that's nearly impossible. We're now going to build exactly that system for free in about two minutes! So, let's build an “is it a bird” system. So, we're going to use python, and I'm going to run through this really quickly. You're not expected to run through it with me because we're going to come back to it. But let's go ahead and run that cell. Okay, so what we're doing is we're searching DuckDuckgo for images of bird photos and we're just going to grab one and so here is the url of the bird that we grabbed. Okay, we'll download it. Okay, so there it is. So we've grabbed a bird and so okay we've now got something that can download pictures of birds. Now we're going to need to build a system that can recognize things that are birds versus things that aren't birds, from photos. Now of course computers need numbers to work with, but luckily images are made of numbers. I actually found this really nice website called pixby where I can grab a bird, and if I wiggle over it (let's pick its beak) you'll see here that that part of the beak was 251 brightness of red, 48 of green, and 21 of blue. So that's RGB. And so you can see as I wave around, those colors are changing (those numbers). And so this picture, the thing that we recognize as a picture, is actually 256 x 171 x 3 numbers, between 0 and 255, representing the amount of red, green and blue on each pixel. So that's going to be an input to our program, that's going to try and figure out whether this is a picture of a bird or not. Okay, so let's go ahead and run this cell, which is going to go through… (and I needed bird and non-bird but you can't really search Google images or DuckDuckGo images for not a bird, it just doesn't work that way. So I just decided to use forest - I thought okay pictures of forest versus pictures of birds sounds like a good starting point.) So I go through each of: forest, and bird. And I search for forest photo and bird photo, download images, and then resize them to be no bigger than 400 pixels on a side - just because we don't need particularly big ones and it takes a surprisingly large amount of time just for a computer to open an image. Okay, so we've now got 200 of each. I find when I download images I often get a few broken ones and if you try and train a model with broken images it will not work. So here's something which just verifies each image and unlinks - so deletes the ones that don't work Okay, so now we can create what's called a data block. So after I run this cell you'll see that I've basically… I'll go through the details of this later, but… a data block gives fast.ai (the library) all the information it needs to create a computer vision model. And so in this case we're basically telling it… get all the image files that we just downloaded. And then we say show me a few up to six, and let's see… yeah, so we've got some birds, forest, bird, bird, forest. Okay, so one of the nice things about doing computer vision models is it's really easy to check your data because you can just look at it - which is not the case for a lot of kinds of models. Okay, so we've now downloaded 200 pictures of birds, 200 pictures of forests, so we'll now press run. And this model is actually running on my laptop, so this is not using a vast data center. It's running on my presentation laptop. And it's doing it at the same time as my laptop is streaming video, which is possibly a bad idea. And so what it's going to do is it's going to run through every photo out of those 400, and for the ones that are forest it's going to learn a bit more about what forest looks like and for the ones that are bird it'll learn a bit more about what bird looks like. So overall it took under 30 seconds, and believe it or not, that's enough to finish doing the thing which was in that xkcd comic. Let's check by passing in that bird that we downloaded at the start. This is a bird. Probability it's a bird: 1.0000 (rounded to the nearest four decimal places). So something pretty extraordinary has happened since late 2015, which is literally something that has gone from so impossible it's a joke to so easy that I can run it on my laptop computer in (I don't know how long it was) about two minutes. And so hopefully that gives you a sense that creating really interesting, you know real working programs with deep learning is something that… doesn't take a lot of code, didn't take any math, didn't take more than my laptop computer. It's pretty accessible in fact. So that's really what we're going to be learning about over the next seven weeks. So where have we got to now with deep learning? Well it moves so fast, but even in the last few weeks we've taken it up another notch as a community. You might have seen that something called DALLꞏEꞏ2 has been released which uses deep learning to generate new pictures. And I thought this was an amazing thing that this guy nick did where he took his friends twitter bios and typed them into the DALLꞏEꞏ2 input and it generated these pictures. So this guy's… he typed in commitment, sympathetic, psychedelic, philosophical, and it generated these pictures. So I'll just show you a few of these. I'll let you read them… I love that. That one's pretty amazing I reckon! actually. I love this. Happy Sisyphus has actually got a happy rock to move around. So this is like, um, yeah, I don't know. When I look at these I still get pretty blown away that this is a computer algorithm using nothing but this text input to generate these arbitrary pictures. In this case of fairly, you know, complex and creative things. So the guy who made those points out, this is like… he spends about two minutes or so, you know, creating each of these. Like he tries a few different prompts and he tries a few different pictures, you know, and so he's given an example here of… like when he types something into the system… like, here's an example of like 10 different things he gets back when he puts in “expressive painting of a man shining rays of justice and transparency, on a blue bird twitter logo.” So it's not just, you know, DALLꞏEꞏ2, to be clear. There's, you know, a lot of different systems doing something like this now. There's something called MidJourney, which this twitter account posted: a “female scientist with a laptop writing code, in a symbolic, meaningful, and vibrant style.” This one here is “an HD photo of a rare psychedelic pink elephant.” And this one I think is the second one here (I never know how to actually pronounce this.) This one's pretty cool “a blind bat with big sunglasses holding a walking stick in his hand.” And so when actual artists, you know, this for example, this guy said he knows nothing about art, you know he's got no artistic talent, it's just something you know, he threw together. This guy is an artist who actually writes his own software based on deep learning and spends, you know, months on building stuff, and as you can see, you can really take it to the next level. It's been really great actually to see how a lot of fast.ai alumni with backgrounds as artists have gone on to bring deep learning and art together, and it's a very exciting direction. And it's not just images to be clear, you know one of another interesting thing that's popped up in the last couple of weeks is google's pathways language model which can take any arbitrary english as text, a question, and can create an answer which not only answers the question but also explains its thinking (whatever it means for a language model to be thinking.) One of the ones I found pretty amazing was that it can explain a joke. I'll let you read this… So, this is actually a joke that probably needs explanations for anybody who's not familiar with TPUs. So it, this model, just took the text as input and created this text as output. And so you can see, you know, again, deep learning models are doing things which I think very few, if any of us, would have believed would be maybe possible to do by computers even in our lifetime This means that there is a lot of practical and ethical considerations. We will touch on them during this course but can't possibly hope to do them justice. So I would certainly encourage you to check out ethics.fast.ai to see our whole data ethics course, taught by my co-founder Dr Rachel Thomas, which goes into these issues in a lot more detail. All right, so as well as being an AI researcher at the University of Queensland and fast.ai, I am also a homeschooling primary school teacher and for that reason I study education a lot. One of the people who I love in education is a guy named Dylan Williams and he has this great approach in his classrooms of figuring out how his students are getting along, which is to put a coloured cup on their desk - green to mean that they're doing fine, yellow cup to mean I'm not quite sure, and a red cup to mean I have no idea what's going on. Now since most of you are watching this remotely I can't look at your cups and I don't think anybody bought coloured cups with them today, so instead we have an online version of this. So what I want you to do is go to cups.fast.ai/fast - that's cups.fast.ai/fast - and don't do this if you're, like, a fast.ai expert who's done the course five times - because if you're following along that doesn't really mean much obviously. This is really for people who are, you know, not already fast.ai experts. And so click one of these colored buttons. And what I will do, is I will go to the teacher version and see what buttons you're pressing. All right! So, so far people are feeling we're not going too fast on the whole. We've got one, nope not one, brief red. Okay! So, hey Nick, this url, the same thing with teacher on the end, if you can you keep that open as well and let me know if it suddenly gets covered in red. If you are somebody who's red, I'm not going to come to you now because there's not enough of you to stop the class. So it's up to you to ask on the forum or on the youtube live chat, and there's a lot of folks luckily who will be able to help you, I hope. All right! I wanted to do a big shout out to Radek. Radeck created cups.fast.ai for me. I said to him last week I need a way of seeing coloured cups on the internet and he wrote it in one evening. And I also wanted to shout out that Radek just announced today that he got a job at Nvidia AI and I wanted to say, you know, that fast.ai alumni around the world very very frequently, like every day or two, email me to say that they've got their dream job. Yeah… If you're looking for inspiration on how to get into the field I couldn't recommend nothing… nothing would be better than checking out Radek's work. And he's actually written a book about his journey. It's got a lot of tips in particular about how to take advantage of fast.ai - make the most of these lessons. And so I would certainly… so check that out as well. And if you're here live he's one of our TAs as well so you can say hello to him afterwards. He looks exactly like this picture here. So I mentioned I spent a lot of time studying education both for my home schooling duties and also for my courses, and you'll see that there's something a bit different, very different, about this course… which is that we started by training a model. We didn't start by doing an in-depth review of linear algebra and calculus. That's because two of my favorite writers and researchers on education Paul Lockhart and David Perkins, and many others talk about how much better people learn when they learn with a context in place. So the way we learn math at school where we do counting and then adding and then fractions and then decimals and then blah blah blah and you know, 15 years later we start doing the really interesting stuff at grad school. That is not the way most people learn effectively. The way most people learn effectively is from the way we teach sports, for example, where we show you a whole game of sports. We show you how much fun it is. You go and start playing sports, simple versions of them, you're not very good right and then you gradually put more and more pieces together. So that's how we do deep learning. You will go into as much depth as the most sophisticated, technically detailed classes you'll find - later, right! But first you'll learn to be very very good at actually building and deploying models. And you will learn why and how things work as you need, to get to the next level. For those of you that have spent a lot of time in technical education (like if you've done a phd or something) will find this deeply uncomfortable because you'll be wanting to understand why everything works from the start. Just do your best to go along with it. Those of you who haven't will find this very natural. Oh! And this is Dylan Wiliam, who I mentioned before - the guy who came up with the really cool cups things. There'll be a lot of tricks that have come out of the educational research literature scattered through this course. On the whole I won't call them out, they'll just be there, but maybe from time to time we'll talk about them. All right! So before we start talking about how we actually built that model and how it works, I guess I should convince you that I'm worth listening to. I'll try to do that reasonably quickly, because I don't like tooting my own horn, but I know it's important. So the first thing I mentioned about me, is that me and my friend Silvain wrote this extremely popular book “Deep Learning for Coders” and that book is what this course is quite heavily based on. We're not going to be using any material from the book directly, and you might be surprised by that, but the reason actually is that the educational research literature shows that people learn things best when they hear the same thing in multiple different ways. So I want you to read the book and you'll also see the same information presented in a different way, in these videos. So on,e of the bits of homework after each lesson will be to read a chapter of the book. A lot of people like the book. Peter Norvig, Director of Research, loves the book. In fact his ones here “one of the best sources for a programmer to become proficient in deep learning.” Eric Topple loves the book. Hal Varian Emeritus Professor at Berkeley, Chief Economist, Google, likes the book. Jerome Pecente who is the head of AI at Facebook likes the book. A lot of people like the book, so hopefully you'll find that you like this material as well. I've spent about 30 years of my life working in and around machine learning including building a number of companies that relied on it. And became the highest ranked competitor in the world on Kaggle in machine learning competitions. My company Enlitic, which I founded, was the first company to specialize in deep learning for medicine, and MIT voted it one of the 50 smartest companies in 2016, just above Facebook and Spacex. I started fast.ai with Rachel Thomas and that was quite a few years ago now, but it's had a big impact on the world already, including work we've done with our students, has been globally recognized, such as our win in the DAWNBench competition which showed how we could train big neural networks faster than anybody in the world, and cheaper than anybody in the world. And so that was a really big step in 2018, which actually made a big difference. Google started using our special approaches in their models. Nvidia started optimizing their stuff using our approaches. So it made quite a big difference there. I'm the inventor of the ULMFiT algorithm which according to the Transformers book was one of the two key foundations behind the modern NLP revolution. This is the paper here. And actually, you know, interesting point about that, it was actually invented for a fast.ai course. So the first time it appeared was not actually in the journal. It was actually in lesson four of the course, I think, the 2016 course, if I remember correctly. And, you know, most importantly of course, I've been teaching this course since Version One. And this is actually, I think, this is the very first version of it (which even back then was getting hbr's attention) a lot of people have been watching the course, and it's been, you know, really widely used. Youtube doesn't show likes anymore, so I have to show you our likes for you. You know it's been amazing to see how many alumni have gone from this to, you know, to really doing amazing things, you know. And so for example Andrej Karpathy told me that at Tesla, I think he said, pretty much everybody who joins Tesla in AI is meant to do this course. I believe at OpenAI, they told me that all the residents joining there first do this course. So this, you know this course, is really widely used in industry and research for people, and they have a lot of success. Okay, so there's a bit of brief information about why you should hopefully keep going with this. All right so let's get back to what's happened here. Why are we able to create a bird recognizer in a minute or two? And why couldn't we do it before? So I'm going to go back to 2012 and in 2012 this was how image recognition was done. This is the computational pathologist - it was a project done at Stanford. A very successful, very famous project that was looking at the five-year survival of breast cancer patients by looking at their histopathology image slides. Now, so this is, like, what I would call a classic machine learning approach. And I spoke to the senior author of this, Daphne Koller, and I asked her why they didn't use deep learning and she said “well it just, you know, it wasn't really on the radar at that point.” So this is like a pre-deep-learning approach. And so the way they did this was they got a big team of mathematicians and computer scientists and pathologists and so forth to get together and build these ideas for features, like relationships between epithelial nuclear neighbors. Thousands and thousands actually they created of features, and each one required a lot of expertise from a cross-disciplinary group of experts at Stanford. So this project took years, and a lot of people, and a lot of code, and a lot of math. And then once they had all these features they then fed them into a machine learning model - in this case, logistic regression, to predict survival. As I say it's very successful, right, but it's not something that I could create for you in a minute at the start of a course. The difference with neural networks is neural networks don't require us to build these features. They build them for us! And so what actually happened was, in I think it was 2015, Matt Zeiler and Rob Fergus took a trained neural network and they looked inside it to see what it had learned. So we don't give it features, we ask it to learn features. So when Zeiler and Zeiler looked inside a neural network, they looked at the actual weights in the model and they drew a picture of them. And this was nine of the sets of weights they found. And this set of weights, for example, finds diagonal edges. This set of weights finds yellow to blue gradients. And this set of weights finds red to green gradients, and so forth, right. And then down here are examples of some bits of photos which closely matched, for example, this feature detector. And deep learning, I mean, is deep because we can then take these features and combine them to create more advanced features. So these are some layer two features. So there's a feature, for example, that finds corners. And a feature that finds curves. And a feature that finds circles. And here are some examples of bits of pictures that the circle finder found. And so remember with a neural net which is the basic function used in deep learning, we don't have to hand code any of these or come up with any of these ideas. You just start with actually a random neural network and your feed it examples and you have it learn to recognize things, and it turns out that these are the things that it creates for itself. So you can then combine these features. And when you combine these features it creates a feature detector, for example, that finds kind of repeating geometric shapes. And it creates a feature detector, for example, that finds kind of frilly little things, which it looks like is finding the edges of flowers. And this feature detector here seems to be finding words. And so the deeper you get the more sophisticated the features it can find are. And so you can imagine that trying to code these things by hand would be, you know, insanely difficult, and you wouldn't know even what to encode by hand, right! So what we're going to learn is how neural networks do this automatically, right, but this is the key difference of why we can now do things that previously we just didn't even conceive of as possible, because now we don't have to hand code the features we look for. They can all be learned. it's important to recognize we're going to be spending some time learning about building image based algorithms and image-based algorithms are not just for images and in fact this is going to be a general theme. We're going to show you some foundational techniques but with creativity these foundational techniques can be used very widely. So for example, an image recognizer can also be used to classify sounds. So this was an example from one of our students who posted on the forum and said for their project they would try classifying sounds and so they basically took sounds and created pictures from their waveforms and then they used an image recognizer on that and they got a state-of-the-art result by the way. Another of our students on the forum said that they did something very similar to take time series and turn them into pictures and then use image classifiers. Another of our students created pictures from mouse movements from... from users of a computer system. So the clicks became dots and the movements became lines and the speed of the movement became colors and then used that to create an image classifier. So you can see with... with some creativity there's a lot of things you can do with images. There's something else I wanted to point out which is that as you saw when we trained a real working bird recognizer image model we didn't need lots of math; there wasn't any. We didn't need lots of data. We had 200 pictures. We didn't need lots of expensive computers; we just used my laptop. This is generally the case for the vast majority of deep learning that you'll need in... in real life. There will be some math that pops up during this course but we will teach it to you as needed or we'll refer you to external resources as... as needed but it'll just be the little bits that you actually need. You know the myth that deep learning needs lots of data, I think, is mainly passed along by big companies that want to sell you computers to store lots of data and to process it. We find that most real world projects don't need extraordinary amounts of data at all and as you'll see there's actually a lot of fantastic places you can do state-of-the-art work for free nowadays which is... which is great news. One of the key reasons for this is because of something called transfer learning which we'll be learning about a lot during this course and it's something which very few people are aware of the pay-off. In this course we'll be using Pytorch. For those of you who are not particularly close to the deep learning world, you might have heard of Tensorflow and not of Pytorch. You might be surprised to hear that Tensorflow has been dying in popularity in recent years and Pytorch is actually growing rapidly and in… in research repositories amongst the top papers, Tensorflow is a tiny minority now compared to Pytorch. This is also great research that's come out from Ryan O'Connor. He also discovered that... the majority of people that were doing Tensorflow in 2018 researchers, the majority have now shifted to Pytorch and I mention this because what people use in research is a very strong leading indicator of what's going to happen in industry because this is where you know all the new algorithms are going to come out. this is where all the papers are going to be written about. it's going to be increasingly difficult to use Tensorflow. We've been using Pytorch since before it came out, before the initial release because we knew just from technical fundamentals, it was far better. So this course has been using Pytorch for a long time. I will say however that Pytorch requires a lot of hairy code for relatively simple things. This is the code required to implement a particular optimizer called AdamW in plain Pytorch. I actually copied this code from the Pytorch repository so as you can see there's a lot of it. This gray bit here is the code required to do the same thing with fast.ai. fast.ai is a library we built on top of Pytorch. This huge difference is not because Pytorch is bad. It's because Pytorch is designed to be a strong foundation to build things on top of, like fast.ai. So... When you use fast.ai - the library, you get access to all the power of Pytorch as well but you shouldn't be writing all this code if you only need to write this much code, right? The problem of writing lots of code is that that's lots of things to make mistakes with, lots of things to, you know, not have best practices in, lots of things to maintain. In general we've found, particularly with deep learning: less code is better. Particularly with fastai, the code you don't write is code that we've basically found kind of best practices for you. So when you use the code that we've provided for you, you know you'll generally find you get better results. So... so fast.ai has been a really popular library and it's very widely used in industry, in academia, and in teaching and as we go through this course we'll be seeing more and more pure Pytorch as we get deeper and deeper underneath to see exactly how things work. The fast.ai library just won the 2020 best paper award for the paper about it in Information so again you can see it's a very well regarded library. Okay so… Okay we're still green, that's good. So you may have noticed something interesting, which is that I'm actually running code in these slides. That's because these slides are not in PowerPoint. These slides are in a Jupyter notebook. Jupyter notebook is the environment in which you will be doing most of your computing. It's a web-based application which is extremely popular and widely used in industry and in academia and in teaching and it is a very very very powerful way to to experiment and explore and to build. Nowadays I would say most people at least most students run jupyter notebooks not on their own computers particularly for data science but on a cloud server of which there's quite a few and as I mentioned earlier if you go to course.fast.ai you can see how to use various different cloud servers. One I'm going to show an example of is Kaggle. So Kaggle doesn't just have competitions but it also has a cloud notebook server and I've got quite a few examples there. So let me give you a quick example of how we use Jupyter notebooks. To... to... to build stuff, to... to experiment, to explore. So on kaggle, if you start with somebody else's notebook... so why don't you start with this one Jupyter notebook 101. If it's your own notebook you'll see a button called edit. If it's somebody else's, that button will say copy and edit. If you use somebody's notebook that you like, make sure you click the upvote button to encourage them and to help other people find it before you go ahead and copy and edit. And once we're in edit mode we can now use this notebook and to use it we can type in any arbitrary expression in python and click run and the very first time we do that it says session is starting it's basically launching a virtual computer for us to run our code. This is all free. In a sense, it's like the world's most powerful calculator. It's a calculator where you have all of the capabilities of the world's I think most popular programming language – certainly it and javascript would be the top two – directly at your disposal. So, python does know how to do one plus one, and so you can see here it spits out the answer. I hate clicking I always use keyboard shortcuts so instead of clicking this little arrow, you just press shift enter to do the same thing but as you can see there's not just calculations here, there's also prose, and so jupyter notebooks are great for explaining to you the version of yourself in six months time, what on earth you were doing, or to your co-workers, or the people in the open source community, or the people you're blogging for etc, and so you just type prose, and as you can see when we create a new cell, you can create a code cell which is a cell that lets you type calculations, or a markdown cell which is a cell that lets you create prose. And the prose uses formatting in a little mini language called “markdown”. There's so many tutorials around I won't explain it to you but it lets you do things like links and so forth. So I'll let you follow through the tutorial in your own time because it really explains to you what to do. One thing to point out is that sometimes you'll see me use cells with an exclamation mark at the start. That's not Python, that's a bash shell command, okay? so that's what the exclamation mark means. As you can see you can put images into notebooks, and so the image I popped in here was the one showing that Jupyter won the 2017 software system award which is pretty much the biggest award there is for this kind of software. Okay, so that's the basic idea of how we use notebooks. So let's have a look at how we do our bird or not bird model. One thing I always like to do when I'm using something like colab or kaggle cloud, cloud platforms that I'm not controlling is, make sure that I'm using the most recent version of any software. So my first cell here is exclamation mark pip install minus u , (that means upgrade) q (for quiet) fast.ai. So that makes sure that we have the latest version of fast.ai, and if you always have that at the start of your notebooks you're never going to have those awkward forum threads where you say “why isn't this working?” and somebody says to you “oh you're using an old version of some software!” So, you'll see here this notebook is the exact thing that I was showing you at the start of this lesson, So, if you haven't done much python, you might be surprised about how little code there is here, and so python is a concise but not too concise language. You'll see that there's less boilerplate than some other languages you might be familiar with, and I'm also taking advantage of a lot of libraries so fast.ai provides a lot of convenient things for you. Oh I forgot to import… So, to use a external library we use import to “import a symbol from a library”. fast.ai has a lot of libraries we provide. They generally start with “fast something” so for example to make it easy to download a url, “fastdownload” has download_url(). To make it easy to create a thumbnail, we have Image.to_thumb() and so forth. So, I always like to view – as I'm building a model – my data at every step. So that's why I first of all grab one bird, and then I grab one forest photo, and I look at them to make sure they look reasonable. And once I think, “okay they look okay”, then I go ahead and download. And, so, you can see fast.ai has a download_images() where you just provide a list of urls so that's how easy it is, and it does that in parallel. So it does that, you know, surprisingly quickly. One other fast.ai thing I'm using here is resize_images(). You generally… you'll find that for computer vision algorithms you don't need particularly big images, so I'm resizing these to a maximum size length of 400 because it just… it's actually much faster because gpus are so quick for big images, most of the time can be taken up just opening it. The neural net itself often takes less time. So that's another good reason to make them smaller. Okay, so the main thing I wanted to tell you about was this data block command. So the data block is the key thing that you're going to want to get familiar with as deep learning practitioners at the start of your journey because the main thing you're going to be trying to figure out is how do I get this data into my model? Now that might surprise you. You might be thinking we should be spending all of our time talking about neural network architectures, and matrix multiplication and gradients and stuff like that, the truth is very little of that comes up in practice, and the reason is, that at this point the deep learning community has found a reasonably small number of types of model that work for nearly all the main applications you'll need; and fast.ai will create the right type of model for you the vast majority of the time. So all of that stuff about tweaking neural network architectures and stuff… I mean we'll get to it eventually in this course but you might be surprised to discover that it almost never comes up. Kind of like if you ever did like a computer science course or something and they spent all this time on the details of compilers and operating systems, and then you get to the real world and you never use it again. So this course is called practical deep learning, and so we're going to focus on the stuff that is practically important. Okay, so our images have finished downloading, and two of them were broken so we just deleted them. Another thing you'll note by the way if you're a keen software engineer is, I tend to use a lot of functional style in my programs I find for kind of the kind of work I do that a functional style works very well if you're, you know a lot of people in python are less familiar with, that it's more, maybe comes, more from other things. So yeah that's why you'll see me using stuff like map and stuff quite a lot. Alright so a data block is the key thing you need to know about if you're going to know how to use different kinds of data sets, and so these are all of the things, basically, that you'll be providing. And so what we did when we designed the data block was we actually looked and said “okay over hundreds of projects what are all the things that change from project to project to get the data into the right shape”?, and we realized we could basically split it down into these five things. So the first thing that we tell fast.ai is “what kind of input do we have”?, and so then, so there are lots of blocks in fast.ai for different kinds of inputs so we said “ah, the input is an image!; “what kind of output is there?”, “what kind of label?.” The output's a category, so that means it's one of a number of possibilities. So that's enough for fast.ai to know what kind of model to build for you. So what are the items in this model? what am I actually going to be looking at to look to train from? this is a function! In fact you might have noticed if you were looking carefully that we use this function here. It's a function which returns a list of all of the image files in a path based on extension, so every time it's going to try and find out what things to train from, it's going to use that function. In this case we'll get a list of image files. Something we'll talk about shortly is that it's critical that you put aside some data for testing the accuracy of your model that's called a “validation set.” It's so critical that fast.ai won't let you train a model without one, so you actually have to tell it how to create a validation set, how to set aside some data, and in this case we say “randomly, set aside 20% of the data.” Okay, next question, then you have to tell fast.ai is “how do we know the correct label of a photo”, how do we know if it's a bird photo or a forest photo? and this is another function, and this function simply returns the parent folder of of a path and so in this case we saved our images into either forest or bird. So that's where the labels are going to come from And then finally, most computer vision architectures need all of your inputs as you train to be the same size, so “item transforms” are all of the bits of code that are gonna run on every item, on every image in this case, and we're saying okay we want you to resize each of them to being 192 by 192 pixels. There's two ways you can resize: you can either crop out a piece in the middle, or you can squish it, and so we're saying “squish it”. So that's the data block that's all that you need, and from there we create an important class called data loaders. Data loaders are the things that, actually, pytorch iterates through to grab a bunch of your data at a time. The way it can do it so fast is by using a GPU which is something that can do thousands of things at the same time and that means it needs thousands of things to do at the same time so a data loader will feed the training algorithm with a “bunch” of your images at once; in fact we don't call it a bunch we call it a “batch” or a “mini batch”. And so, when we say show “batch” that's actually a very specific word in deep learning. It's saying show me an example of a batch of data that you would be passing into the model, and so you can see showbatch gives you – tells you – two things: the input which is the picture, and the label, and remember! the label came by calling that function. So, when you come to building your own models you'll be wanting to know what kind of splitters are there? what kinds of labeling functions are there? and so forth what's wrong button you'll be wanting to know what kind of labeling functions are there and what kind of splitters are there and so forth and so docs.fast.ai is where you go to get that information. Often the best place to go is to choose the tutorials. So for example here's a whole data block tutorial, and there's lots and lots of examples, so hopefully you can start out by finding something that's similar to what you want to do and see how we did it, but then of course there's also the underlying api information so here's data blocks! Okay. How are we doing? Still doing good! Alright. So at the end of all this we've got an object called “dls” that stand for “data loaders” and that contains iterators that pytorch can run through to grab batches of randomly split out training images to train the model with, and validation images to test the model with. So now we need a model. The critical concept here in fastai is called a “learner”. A “learner” is something which combines a model, which is, that is, the actual neural network function we’ll be training, and the data we use to train it with; and that's why you have to pass in two things: the data which is the data loaders object, and a model. And so the model is going to be the actual neural network function that you want to pass in, and as I said there's a relatively small number that basically work for the vast majority of things you do. If you pass in just a bare symbol like this it's going to be one of fast.ai's built-in models but what's particularly interesting is that we integrate a wonderful library by Ross Wightman called “timm” (the pytorch image models) which is the largest collection of computer vision models in the world, and at this point fast.ai is the first and only framework to integrate this. So you can use any one of the pytorch image models and one of our students Amanamoro was kind enough to create this fantastic documentation where you can find out all about the different models. And if we click on here, you can get lots and lots of information about all the different models that Ross has provided. Having said that, the model family called “resnet” are probably going to be fine for nearly all the things you want to do, but it is fun to try different models out so you can type in any string here to use any one of those other models Okay so if we run that, let's see what happens okay? So this is interesting… So when I ran this… so, remember on kaggle it's creating a new virtual computer for us, so it doesn't really have anything ready to go, so when I ran this the first thing it did was it said “downloading resnet18.pth” What's that? well, the reason we can do this so fast is, because somebody else has already trained this model to recognize over 1 million images of over 1 000 different types; something called the “ImageNet” dataset, and they then made those weights available – those parameters– available on the internet for anybody to download. By default on fast.ai when you ask for a model, we will download those weights for you, so that you don't start with a random network that can't do anything. You actually start with a network that can do an awful lot, and so, then something that fast.ai has, that's unique, is this fine_tune method, which, what it does is: it takes those pre-trained weights we downloaded for you, and it adjusts them in a really carefully controlled way to just teach the model the differences between your data set, and what it was originally trained for. That's called “fine tuning”; hence the name. So that's why you'll see this downloading happen first, and so as you can see at the end of it, this is the error right here after a few seconds it's a 100% accurate So we now have a learner, and this learner has been, has started with, a pre-trained model that's been fine-tuned for the purpose of recognizing bird pictures from forest pictures. So you can now call dot predict on it, and dot predict you pass in an image, and so this is how you would then deploy your model. So in the code you have whatever it needs to do. So in this particular case this person had some reason that he needs the app to check whether they're in the national park, and whether it's a photo of a bird, so at the vet where they need to know if it's a photo of a bird it would just call this one line of code learn.predict(); and so that's going to return whether it's a bird or not as a string, whether it's a bird or not as an integer, and the probability that it's a non-bird or a bird; and so that's why we can print these things out. Okay so that's how we can create a computer vision model. What about other kinds of models? right? There's a lot more in the world than just computer vision. A lot more than just image recognition. Well even within computer vision, there's a lot more than just image recognition. For example, for example, there's segmentation! So segmentation, maybe the best way to explain segmentation is to show you the result of this model. Segmentation is where we take photos – in this case of road scenes – and we color in every pixel according to what it's a, what is it. So in this case we've got brown as cars, blue is fences I guess? red is buildings? or brown? And so on the left here some photos that some somebody has already gone through, and classified every pixel of, every one of these images according to what that pixel is a pixel of, and then on the right is what our model is guessing, and as you can see it's getting a lot of the pixels correct and some of them it’s getting wrong. It's actually amazing how many is getting correct because this particular model I trained in about 20 seconds using a tiny, tiny, tiny, amount of data. So you know, again, like, you would think this would be a particularly challenging problem to solve, but it took about 20 seconds of training to solve it, not amazingly well, but pretty well. If I trained it for another two minutes it'd probably be close to perfect. So this is called “segmentation”. Now you'll see that there's very, very, little data required, and … sorry, very little CODE required, and the steps are actually going to look quite familiar. In fact in this case we're using an even simpler approach. Earlier on, we used “data blocks”. Data blocks are a kind of intermediate level very … flexible approach that you can take to handling almost any kind of data, but for the kinds of data that occur a lot, you can use these special data loaders classes which kind of lets you use even less code. So, in this case, to create data loaders for segmentation, you can just say: “okay I'm going to pass you in a function for labeling”; and you can see here it's got pretty similar things that we pass in, to what we passed in for data blocks before. So our file names is get_image_files() again, and then our label function is something that grabs this path, and the codes. So the labels for further segmentation, sorry, the codes. So, like, “what does each code mean?” is going to be this text file, but you can see the basic information we're providing is very very similar regardless of whether we're doing segmentation or object recognition; and then the next steps are pretty much the same. We create a learner for segmentation. We create, so, we've got a UNet learner which we'll learn about later, and then again, we call fine_tune() so that is it! and that's how we create a segmentation model! What about stepping away from computer vision? So perhaps the most widely used kind of model used in industry is tabular analysis. So, taking things like spreadsheets and database tables, and trying to predict columns of those. So, in tabular analysis, it really looks very similar to what we've seen already, We grabbed some data and you'll see when I call this untar_data(), this is the thing in fast.ai that downloads some data and decompresses it for you and there's a whole lot of urls provided by fast.ai for all the, kind of common data sets that you might want to use; you know all the ones that are in the book? or lots of data sets that are kind of widely used in learning and research? So, it makes life nice and easy for you. So again, we're going to create data loaders, but this time it's tabular data loaders, but we provide pretty similar kind of information to what we have before. Couple of new things we have to tell it. “Which of the columns are categorical?” so they can only take one of a few values, and “which ones are continuous” so they can take, basically, any real number, and then again, we can use the exact same show_batch() that we've seen before, to see the data, and so, fast.ai uses a lot of something called “type dispatch” which is a system that's particularly popular in in a language called Julia, to basically, automatically, do the right thing for your data, regardless of what kind of data it is. So, if you call show_batch() on something you should go get back something “useful” regardless of what kind of information you provided. So for a table it shows you on the information in that table. This particular data set is a data set of whether people have less than fifty thousand dollars or more than fifty thousand dollars in salary for different districts based on demographic information in each district. So, to build a model for that data loader, we do as always “something” underscore “learner”. In this case it's a tabular learner. Now this time we don't say: “fine_tune()”; we say “fit”. Specifically, fit_one_cycle(). That's because for tabular models, there's not generally going to be a pre-trained model that already does something like what you want, because every table of data is very different, whereas pictures often have a similar theme, you know? They're all pictures. They all have the same kind of “general idea” of what pictures are. So that's why it generally doesn't make too much sense to fine-tune a tabular model. So instead, you just fit. So there's one difference there. I'll show another example. Okay, so, collaborative filtering. “Collaborative filtering” is the basis of most recommendation systems today. It's a system where we basically take data set that says: which users liked which products, or which users used which products, and then we use that to guess what other products those users might like based on finding similar users, and what those similar users like. The interesting thing about collaborative filtering is that when we say “similar users,” we're not referring to similar demographically, but similar in the sense of people who liked the same kinds of products. So, for example, if you use any of the music systems like Spotify, or Apple music, or whatever, it'll ask you first, like, “what's a few pieces of music you like,” and you tell it, and then it says “okay well maybe let's start playing this music for you,” and that's how it works. It uses collaborative filtering. So we can create a collaborative filtering data loaders in exactly the same way that we're used to, by downloading and decompressing some data, create our collab data loaders. In this case we can just save from csv, and pass in a csv, and this is what collaborative filtering data looks like. It's going to have, generally speaking, a user id, some kind of product id; in this case, a movie, and a rating. So, in this case this user gave this movie a rating of 3.5 out of 5, and so, again, you can see show_batch, right? So use show batch. You should get back some useful visualization of your data regardless of what kind of data it is, and so again, we create a “learner.” This time the “collaborative filtering” learner, and you pass in your data. In this case we give it one extra piece of information, which is – because this is not predicting a category, but it's predicting a real number – we tell it: “what's the possible range.” The actual range is one to five, but for reasons you'll learn about later, it's a good idea to actually go from a little bit lower than the possible minimum, to a little bit higher. That's why I say 0.5 to 5.5, and then fine-tune. Now, again, you know we don't really need to fine-tune here because there's not really such a thing as a pre-trained collaborative filtering model. We could just say “fit” or “fit_one_cycle”, but actually fine_tune() works fine as well. So, after we train it for a while, this here is the “mean squared error”, so it's basically, that: “on average how far off are we for the validation set”. And you can see as we train, and it's literally so fast (it's less than a second each epoch) that error goes down and down, and for any kind of fast.ai model you can always call show_results() , and get something sensible. So in this case it's going to show a few examples of users and movies. Here's the actual rating that user gave that movie and here's the rating that the model predicted. Okay, so, apparently a lot of people on the forum are asking how I'm turning this notebook into a presentation? So, I'll be delighted to show you, because I'm very pleased that these people made this thing for free for us to use. It's called “Rise”, and all I do is, it's a notebook extension, and in your notebook it gives you an extra little thing on the side where you say which things are slides, or which things are fragments, and a fragment just being… so, this is a slide, that's a fragment. So if I do that you'll see it starts with a slide and then the fragment gets added in. Yeah, that's about all there is to it! Actually it's pretty great, and it's very well documented. You know I'll just mention, like what do I make with Jupyter notebooks: this entire book was written entirely in Jupyter notebooks. Here are the notebooks. So if you go to the fastai fastbook repo… if you go to the fastai fastbook repo, you can read the whole book, and because it's all in notebooks, every time we say: “here's how you create this plot,” or “here's how you train this model,” you can actually create the plot, or you can actually train the model, because it's all notebooks. The entire fast.ai library is actually written in notebooks. So you might be surprised to discover that if you go to fast.ai fast.ai that the source code for the entire library is notebooks, and so the nice thing about this is that, you know, the source code for the fast.ai library has actual pictures of the actual things that we're building for example. What else have we done with notebooks? oh! blogging! I love blogging with notebooks, because when I want to explain something I just write the code, and you can just see the outputs, and it all just works. Another thing you might be surprised by, is all of our tests and continuous integration are also all in notebooks. So every time we change one of our notebooks… every time we change one of our notebooks, hundreds of tests get run automatically, in parallel, and if there's any issues we will find out about it. So, yeah, notebooks are great! and rise is… is a really nice way to do slides in notebooks. Alright. So, what can deep learning do at present? We're still scratching this the tip of the iceberg even though it's a pretty well hyped you know heavily marketed technology this pro… you know when we started in 2014 or so, you know, not many people were talking about deep learning, and really there was no accessible way to get started with it. There were no pre-trained models you could download, you know. There was just starting to appear some of the first open source software that would run on gpus, but yeah, I mean, but despite the fact that today there's a lot of people talking about deep learning, we're just scratching the surface. Every time pretty much somebody says to me I work in domain x, and I thought I might try deep learning out to see if it can help, and I see them a few months later, and I say “how did it go?” they nearly always say: “well we just broke the state-of-the-art results in our field.” So, you know, when I say these are things that it's currently state of the art for – these are kind of the ones that people have tried so far – but still, most things haven't been tried. So, in NLP, deep learning is the state of the art method in all these kinds of things and a lot more; computer vision, medicine, biology recommendation systems, playing games, robotics… I mean it's just… I've tried… I've tried elsewhere to make bigger lists, and I just end up with pages and pages and pages! so yeah it's… it's generally speaking you know, if it's something that a human can do reasonably quickly, like look at a go board and decide if it looks like a good go board or not, even if it needs to be an ex even… if it needs to be an expert human, then that's probably something that deep learning will be pretty good at. If it's something that takes a lot of logical thought processes over an extended period of time, you know, particularly if it's not based on much data, maybe not. You know, like, “who's going to win the next election”, or something like that. That'd be kind of broadly how I try to decide: is your thing useful for deep… you know, good for deep learning or not. You know it's been a long time to get to this point. Yes, deep learning is incredibly powerful now, but it's taken decades of work. This was the first neural network. Remember, neural networks are the basis of deep learning. So this was back in 1957. The basic ideas have not changed much at all, but you know we do have things like gpus now, and solid-state drives, and stuff like that, and of course much more data, just is available, now, but this has been decades of really hard work by a lot of people to get to this point. So let's kind of take a step back, and talk about, like, what's what's going on in these models, and I'm going to describe the basic idea of machine learning largely as it was described by Arthur Samuel in the late 50s when it was invented, and I'm going to kind of do it with these graphs; oh which, by the way, you might find fun… these graphs are themselves… created with Jupyter notebooks. So these are graphviz descriptions that are going to get turned into these, so there's a little sneak peek behind the scenes for you? So, let's start with kind of a graph of, like, well, what does a normal program look like? right? so, in the pre deep learning, and machine learning days, well, you know, you'd have… you still have inputs, and you still have results. Right? and then you code a program in the middle, which is, you know, a bunch of conditionals, and loops, and setting variables, and blah blah blah. Okay. A “machine learning model” doesn't look that different… but… The program has been replaced with something called “a model,” and we don't just have inputs now, we now also have weights, which are also called “parameters,” and the key thing is this: the model is not any more a bunch of conditionals, and loops, and things. It's a mathematical function. In the case of a neural network, it's a mathematical function that takes the inputs, multiplies them together by the weights – by one set of weights – and adds them up; and then it does that again for a second set of weights and adds them up, does it again for a third set of weights, and adds them up and so forth. It then takes all the negative numbers, and replaces them with zeros, and then it takes those as inputs to the next layer. It does the same thing; multiplies them a bunch of times, and adds them up, and it does that a few times, and that's called “a neural network.” Now, the model therefore, is not going to do anything useful, unless these weights are very carefully chosen. And, so the way it works is, that we actually start out with these weights as being random. So initially, this thing doesn't do anything useful at all! So, what we do –- the way arthur samuel described it back in the late 50s (the inventor of machine learning) – is, he said: “okay, let's take the inputs, and the weights, put them through our model.” He wasn't talking particularly about neural networks. He's just like… whatever model you like… get the results, and then let's decide how “good” they are, right? So, if for example, we're trying to decide: “is this a picture of a bird?” and the model said – which initially is random – says “this isn't a bird,” and actually it IS a bird, we would say: “oh you're wrong!” So we then calculate “the loss.” So, the loss is a number that says how good were the results. So that's all pretty straightforward, you know. We could, for example, say oh what's the accuracy? We could look at 100 photos and say which percentage of them did it get right. No worries. Now the critical step is this arrow. We need a way of updating the weights that is coming up with a new set of weights that are a bit better than the previous set. Okay. And by a bit better we mean it should make the loss get a little bit better. So we've got this number that says how good is our model and initially it's terrible, right? It's random. We need some mechanism of making a little bit better. If we can just do that one thing then we just need to iterate this a few times. Because each time we put in some more inputs and... and put in our weights and get our loss and use it to make it a little bit better, then if we make it a little bit better enough times, eventually it's going to get good. Assuming that our model is flexible enough to represent the thing we want to do. Now remember what I told you earlier about what a neural network is: which is basically multiplying things together and adding them up and replacing the negatives with zeros and you do that a few times, that is provably an infinitely flexible function. So it actually turns out that that incredibly simple sequence of steps, if you repeat it a few times, you do enough of them, can solve any computable function. And something like generate an artwork based off somebody's twitter bio is an example of a computable function, right? Or translate English to Chinese is an example of a computable function. So they're not the kinds of normal functions you do in year eight math right but they are computable functions. and so therefore if we can just create this step then and use the neural network as our model then we're good to go. In theory we can solve anything given enough time and enough data and so that's exactly what we do. And so once we've finished that training procedure we don't need the loss anymore and even the weights themselves, we can integrate them, kind of into the model, right? We finish changing them so we can just say that's now fixed and so once we've done that we now have something which takes inputs, puts them through a model and gives us results. It looks exactly like our original idea of a program and that's why we can do what I described earlier, that is once we've got that learn.predict for our bird recognizer we can insert it into any piece of computer code, right? Once we've got a trained model it's just another piece of code we can call with some inputs and get some outputs. Deploying machine learning models in practice can come with a lot of, you know, little tricky details but the basic idea in your code is that you're just going to have a line of code that says learn.predict and then you just fit it in with all the rest of your code in the usual way. And this is why: because a trained model is just another thing that maps inputs to results. Okay. All right. so as we come to wrap up this first lesson, for those of you that are already familiar with notebooks and Python, there's... you know... this has got to be pretty easy for you. You're just going to be using some stuff that you're already familiar with and some slightly new libraries. For those of you who are not familiar with Python, you know, you... you're biting into a big thing here. There's obviously a lot you're going to have to learn and to be clear I'm... I'm not going to be teaching Python in this course but we do have links to great python resources in the forum. So check out... check out that thread. Regardless of where you're at, the most important thing is to experiment and so experimenting could be as simple as just running those Kaggle notebooks that I've shown you just to see them run. You could try changing things a little bit. I'd really love you to try doing the... the bird or forest exercise but come up with something else. Maybe try to use three or four categories rather than two, you know. Have a think about something that you think would be fun to try. Depending on where you're at, you know, push yourself a little bit but not too much. So make sure you get something finished before the next lesson. Most importantly, read chapter one of the book. It's got much the same stuff that we've seen today but presented in a slightly different way. And then come back to the forums and present what you've done in the share your work here thread. After the first time we did this in year one of the course we got over a thousand replies and of those replies it's amazing how many of them have ended up turning into new startups, scientific papers, job offers. It's been really cool to watch people's journeys and some of them are just plain fun, you know. So this person classified different types of Trinidad and Tobago people. So you know, people do stuff based on where they live and what their interests are. I don't know if this person is particularly interested in zucchini and cucumber but they made a zucchini and cucumber classifier. I thought this was a really interesting one: classifying satellite imagery into what city it's probably a picture of. Amazingly accurate actually. 85 percent with 110 classes. Panama city bus classifier. Batik cloth classifier. This one, you know, very practically important, you know, recognizing the state of buildings. We've had quite a few students actually move into disaster resilience based on satellite imagery using exactly this kind of work. We've already actually seen this example. Ethan Sutin the... the sound classifier and I mentioned it was state of the art. He actually checked up the dataset's website and found that he beat the state of the art for that. Alena Harley did tumor-normal sequencing. So she was at Human Longevity International. So she actually did three different really interesting pieces of cancer work during that first course if I remember correctly. And I showed you this picture before. What I didn't mention is actually this... this student Gleb was a software developer at Splunk, a big NASDAQ listed company. And this student project he did turned into a new patented product at Splunk and a big blog post and the whole thing turned out to be really cool. Basically something to identify fraudsters using image recognition with these pictures we discussed. One of our students built this startup called Envision. Anyway there's been lots and lots of examples. So all of this is to say, you know, have... have a go at you know, starting something. Create something you think would be fun or or interesting and share it in the forum. If you're a total beginner with Python, you know, then start... start with something simple. But I think you'll find people very encouraging and if you've done this a few times before then try to push yourself a little bit further. And don't forget to look at the quiz questions at the end of the book and see if you can answer them all correctly. All right, thanks everybody, so much for coming. Okay. Thanks so much for coming everybody. Bye.

Info

Channel: Jeremy Howard

Views: 319,216

Rating: undefined out of 5

Keywords: deep learning, fastai

Id: 8SF_h3xF3cE

Channel Id: undefined

Length: 82min 55sec (4975 seconds)

Published: Thu Jul 21 2022