An overview of fastai2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um basically every time and i found something that didn't quite work the way we wanted it uh at any part of the stack um we wrote our own so it's kind of like building something with no particular deadline and trying to do everything the very very best we can um so the layout api uh of fasta iav2 um starts at the applications layer which is where most beginners will start and it looks a lot like um fast ai v1 which is the released version of the software that people have seen before but v2 everything is rewritten from scratch it's totally new there's no code borrowed but the top level api looks quite similar the idea is that in one two three four lines of code you can create a state-of-the-art computer vision classifier including transfer learning with nearly the same one two three four lines of code oh five lines of code in this case uh oh because we're also displaying uh you can create a state-of-the-art segmentation model and actually like when i say state of the art like for example this segmentation model is to the best of my knowledge still better than any published result on this particular canva data set so like these five lines of code are super good five lines of code and as you can see it includes a line of code which if you say show batch it will display your data in an appropriate format in this case showing you segmentation a picture and the color-coded pixels overlaid on top of the picture the same basic four lines of code we'll do text classification so here's the basis of uh ulm fit um which is a system that we developed and wrote up along with sebastian ruder for transfer learning in natural language processing and as you can see in here this is working on imdb on a single epoch in four minutes the accuracy here is basically what was the state of the art as of a couple of years ago uh tabular or time series analysis same deal basically a few lines of code nearly exactly the same lines of code and you'll get a great result from your tabular data and ditto for collaborative filtering so the high level api for uh fast ai v2 is designed to be something where you know regardless of what application you're working on you can get a great result from it using sensible defaults and carefully selected hyperparameters automatically largely done for you for the most common kinds of problems that people look at and that bit doesn't look that different to v1 but understanding how we get to that is kind of interesting and involves getting deeper and deeper this approach though does work super well and partly it's because this is based on quite a few years of research to figure out what are the best ways to solve various problems along the way and when people actually try using fast ai they're often surprised so this person posted on our forum that they've been working in tf2 for a while and for some reason they couldn't figure out all of their models are suddenly working much better and the answer is basically they're getting all these nice kind of curated best practices and somebody else on twitter saw that and said yep we found the same thing we were trying tensorflow spent months tweaking and then we switched to fast ai a couple of days later we were getting better results so these kind of carefully curated defaults and algorithms and high-level apis that do things right for you the first time even for experienced practitioners can give you better results faster but it's actually the other pieces that are more i think interesting for a swift conversation because as the deeper we go into how we make that work the more stuff you'll see which will be a great fit i think with swift the so the mid layer api is something which is largely new um too fast uh to actually i guess the foundation layer is new so the mid layer i guess i'd say is more rewritten for v1 and it contains some of the things that make those high-level apis easy one of the bits which is the most interesting is the training loop itself and i thank silva for the set of slides we have for the training loop uh this is what a training loop looks like in pi torch um we calculate some predictions uh we get a loss we do a backwards uh path to get the gradients we do an optimizer step and then optionally we from time to time we'll zero the gradients based on if we're doing um when we're accumulating um so this is what that loop looks like around the model get the loss do the gradients step the optimizer do that a bunch of times um but uh you want to do something interesting you'll need to add something to the loop to do keeping track of your training statistics in in tensorboard or in fast progress or whatever you might want to schedule various hyper parameters in various different ways you might want to add various different characterization uh you may want to do mixed precision training you may want to do gans so this is a problem because either you have to write a new training loop for every time you want to add a different tweak now making all those tweaks work together then becomes incredibly complicated or each or you try and write one training loop which does everything you can think of this is the training loop for fast ai 0.7 which only did a tiny subset of the things i just said that was still getting ridiculous or you can add callbacks at each step now the idea of callbacks has been around in deep learning for a long time apis but what's very different about fast ai is that every callback is actually a two-way callback it can read absolutely everything it can read gradients parameters data uh so forth and it can write them so it can actually change anything at any time so the callbacks are we say infinitely flexible we feel pretty confident in that because the training loop in fastai has not needed to be modified to do any of the tweaks that i showed you before so even the entirety of training gans can be done in a callback so basically we switch out a basic training loop and replace it with one with the same five steps but callbacks between every step so that means for example if you want to do a scheduler you can define a batch begin that sets the optimizer's learning rate to some function or if you want to do early stopping you can write an on epoc end that checks the metrics and stops training or you can do parallel training setup data parallel and pretty happy at the end of training uh take data parallel off again gradient clipping you have access to the parameters themselves so you can click the gradient norms at the end of the backward step and so forth so all of these different things are all things that have been written with fasta callbacks um including for example mixed precision all of nvidia's um recommendations mixed precision training will be added automatically if you just add a to fp16 at the end of your loan call um [Music] and really importantly you know for example all of those mixed position things can be combined with multi gpu and one cycle training and gradient accumulation um and so forth and so trying to you know create a state-of-the-art model which involves combining state-of-the-art regularization and mixed precision and distributed training and so forth there's a really really really hard job but with this approach um it's actually just a single extra line of code to add each feature and they all explicitly are designed to work with each other and are tested to work with each other uh so for instance here is mix-up data augmentation which is a incredibly powerful data augmentation method that has powered lots of state-of-the-art results and as you can see it's well under a screen of code by comparison uh here is the version of mix-up from the paper uh not only is it far longer but it only works with one particular data set and one particular optimizer and is full of all kinds of assumptions and only one particular kind of metric and so forth so that's an example of these mid-tier apis another one is the optimizer it turns out that uh you know it looks like there's been lots and lots of different optimizers appearing in the last year or two but actually it turns out that they're all minor tweaks on each other most libraries don't write them this way so for example um adam w uh also known as decoupled weight decay adam was added to pytorch quite recently in the last month or two and it required writing a whole new uh class and a whole new step um to implement and it took you know it was like two or three years after the paper was released on the other hand uh fast ai's implementation as you can see involves a single extra function containing two lines of code and this little bit of gray here so it's kind of like two and a half three lines of code to implement the same thing because what we did was we realized um let's refactor the idea of an optimizer see what's different for each of these you know state-of-the-art optimizers that have appeared recently and make it so that each of those things can be added and removed by just changing two things stats and steppers a stat is something that you measure during training such as the gradients or the gradient squared or you might use dampening or momentum or whatever and then a stepper is something that uses those stats to change the weights in some way and you can combine those things together and by combining these we've been able to implement all these different optimizers so for instance the lam optimizer which came out of google and was super cool at reducing pre-training time from three days to 76 minutes we were able to implement that in this tiny piece of code and one of the nice things is that when you compare it to the math it really looks almost line for line identical except ours is a little bit nicer because we refactored some of the math so it like makes it really easy to do research as well because you can kind of quite directly bring the equations across into your code then the last of the mid-tier apis is the data block api which is something we had in version one as well but when we um were porting that to swift we had an opportunity to rethink it and actually alexis gallagher in particular helped us to rethink it in a more idiomatically swifty way uh and it came out really nicely and so then we took the result of that and kind of ported it back into python and we ended up with something that was quite a bit nicer so there's been a kind of a nice interaction and interplay between um fast ai and python and swift ai in swift in terms of helping each other's apis but basically the data block api is something where you define each of the key things that the program needs to know to flexibly get your data into a form you can put in a model so it needs to know what type of data do you have uh how do you get that data how do you split it into a training set in a validation set and then put that all together into a data bunch which is just a simple little class it's literally i think four lines of code which just uh has the the validation set and the training set in one place um so with a data block um you just say okay my types i want to create a black and white pillow image for my x and a category for my y and to get the list of files for those i need to use this function and to split those files into training and validation use this function which is looking at the grandparent path directory name um and to get the labels use this function which is use the parent's path name uh and so with that uh that's enough to give you um mnist for instance and so once you've done this you end up with a data bunch and as i mentioned before everything has a show batch so one of the nice things is it makes it very easy for you to look at your data regardless of whether it's tabular or collaborative filtering or vision or text or even audio if it was audio it would let you show you a spectrogram and let you play play the sound so you can do custom labeling with data blocks by using for example a regular expression labeler um you can get your labels from an external file uh or data frame and they could be model with multi-labels so this thing here knows it's a multi-label classification task so it's automatically put a semi-colon between each label again it's still basically just three lines of code to define the data block so here's a data block for segmentation and you can see really the only thing i had to change here was that my dependent variable has been changed from category to pillow mask and again automatically i show batch works and we can train a model from that straight away as well you could do key points so here i've just changed my dependent variable to tensor point and so now it knows how to behave with that object detection so now change my dependent variable to bounding box and you can see i've got my bounding boxes here uh text and so forth so actually going back i have a couple questions if you're if it's okay yeah so um it's if you the the code you've got sort of the the the x's and y's and these both these sounds like these different data types roughly conform to a protocol yep we're going to get to that in a moment absolutely as well yep um that's an excellent way to think of it and actually this is the way it looked about three weeks ago it now it looks even more like a protocol so yes this is where this is what uh where it all comes from which is the foundation apis and this is the bit that i think is the most relevant to swift um and a lot of this i think would be a lot easier to write in swift so um the first thing that we added to pi torch was object oriented tenses for for too long we've all been satisfied with a data type called tensor which has no semantics to it and so those tenses actually represent something like a sentence or a picture of a cat or a recording of somebody saying something so why can't i take one of those tenses and say dot flip or dot rotate or dot resample or dot translate to german um well the ad the answer is you can't because it's just a tensor without a type so we have added uh types to tensors so you can now have a tensor image a tensor point tensor bounding box uh and you can define a flip left right for each uh and so this is some of the source code from we've written our own computer vision library so that now you can say flip lr and it flips the puppy and if it was a key points it would flip the key points if it was a bounding box it would flip the bounding boxes uh and so forth so this is an example of how tensors which carry around semantics are nice it's also nice that i didn't just say dot show right so dot show is something that's defined for all fast ai v2 tensor types and it will just display that tensor it could even be a tuple containing a tensor and some bounding boxes and some bounding box classes whatever it is it will be able to display it it will be able to convert it into batches for modeling and so forth so you know with with that we can now create for example a random transformation called flip item and we can say that the encoding of that random transformation is defined for a pillow image or any tensor type and in each case the implementation is simply to call x dot flip ella or we could do the dihedral symmetry transforms in the same way before we call grab a random number between zero and seven to decide which of the eight um transposes to do and then for encodes called x dot let's dihedral with that thing we just got and so now we can uh call that transform a bunch of times and each time we'll get back a different random augmentation so a lot of these things become nice and easy hey jeremy maxim asked why isn't tensor backing data structure for an image type tensor image is a tensor which is an image type why is it he says why isn't tensor a backing why not have a different type named image i guess that has a tensor inside of it do you mean why inherit rather than compose apparently yes that yeah so um inheritance um i mean you can do both and you can create identical apis inheritance just has the benefit that all the normal stuff you can do with the tensor you can do with a tensor that happens to be an image so just because a tensor is an image doesn't mean you now don't want to be able to do fancy indexing to it or do an lu decomposition of it or stack it with other tensors across some axis so basically uh a tensor image ought to have all the behavior of a tensor plus additional behavior so that's why we used inheritance we have a version that uses composition as well and it uses um python's nice get atra functionality to pass on all of the uh all of the behavior of tensor but it comes out more nicely in python when you do inheritance and actually the pytorch team has decided to officially implement semantic tensor subtypes now and so hopefully in the next version of pi torch you won't have to use the extremely ugly hacks that we had to use to make this work and then you'll be able to use the real ones and hopefully you'll see in torch vision some of these ideas will be brought over there uh can i ask you so how does that the type propagate so if you do arithmetic on a image tensor do you get an image tensor background so i had a conversation uh about this a few months ago and i said i'm banging my head around this issue of types not carrying around their behavior and chris casually mentioned oh yes that thing is called higher kind of types so i went home and that was one of these phrases that i thought only functional programming dweebs talked about and i would never have to care about it because it actually matters a lot and it's basically the idea that if you have a tensor image and you add one to it you want to get back a tensor image because it should be an image that's a bit brighter rather than something that loses its type so we implemented our own again hacky partial higher kind of type implementation in fasta ii so any of these things that you do to a tensor of a subtype you will nearly always get back the correctly subtyped tensor i mean i saw that pie torch recently started talking about their um um named indexing extensions for their tensors as well and yeah they seem to have a similar kind of challenge there where when you start doing arithmetic and other things like that on a tensor that has named dimensions you want to propagate those along and i don't know yeah yeah so we we haven't started using that yet because it hasn't quite landed as stable um but yeah it's uh it we talked to the pie twitch team at the dev con and that we certainly are planning to um bring these ideas together they're all orbit related concerns yeah i just mean that i i assume that that feature has the same problem the same challenge i assume so yeah so it would be interesting to see what they do yeah yeah it would um yeah so you know it's kind of nice not only do we get to be able to say dot show batch but you can even go dot show results uh and um in this case it knows what the independent variables type is it knows what the dependent variables type is and it even knows things like hey for a classification task those two things should be the same and if they're not by default i will highlight red so these like lower level foundations are the things that drive our ability to easily add this higher level functionality so you know this is the kind of ugly stuff we didn't wouldn't have to do in in swift we had to write our own type dispatch system so that we can annotate things with types and those type annotations are actually semantic and so we now have uh the joyfully modern um idea of function overloading in python which has made life a lot easier and we already have that um um do you have do you have many users that are using this yet it's still pre-released it's not even alpha um but there is a enthusiastic uh early adopter community who is using it so for example the user contributed audio library has already been ported to it i've also built a medical imaging library on top of it and i've written a series of five notebooks showing how to do ct scan analysis with it um so it's kind of like um it it works um and i i was curious what what your users think of it because there's there's this very strongly held conception that python folks hate types and yeah and you're kind of providing a little bit of typing yeah and i'm curious how they react to that the extremely biased subset of early adopter class ai enthusiasts who are using it um love it and they tend to be people who have gone pretty deep in the past so for example my friend andrew shaw who wrote something called music autobot which is one of the coolest things in the world in case nobody has in case you haven't seen it yet which is something where you can generate music using a neural network you can put in some melodies and some chords and it will auto complete some additional melodies and chords or you could put it in a melody and it will automatically add chords or uh you can add add chords or create melody and so he had to write his own um midi um library midi uh he rewrote it in v2 and he said it's just like so so so much easier thanks to those mid-tier apis um so yeah at this stage enthusiasts i was just gonna i was just gonna jump in quick um i've i've been uh helping with some of the audio stuff and it's been it's been really awesome so it it makes things a lot more flexible than version one so that that's probably my favorite thing about it is everything can be interchanged nothing is like well it's got to be this way because that's how it is that's cool cool thanks another piece of the transform is uh of the foundation is the uh partially reversible composed function pipeline dispatched over collections which really rolls off the tongue we call them transform and pipeline um basically the idea is that the way you kind of want function dispatch to work and function composition to work in deep learning is a little different to other places uh there's a couple of things the first is uh you often want to dispatch over tuples and what i mean by that is if you have um a function called flip left right and you have a tuple representing a mini batch where your independent variable is a picture and your dependent variable is a set of bounding boxes if you say flip left right on that tuple you would expect both the x and the y to be flipped and to be flipped with the with the type appropriate method so our transforms will automatically send each element of a tuple to the function separately and or dispatch according to their types automatically we've mentioned type retention so the kind of basic type type stuff we need one interesting thing is not only encoding so in other words applying the function you also need to be able to decode which is to kind of de-apply the function so for example a categorization transform would take the word dog and convert it to the number one perhaps which is what you need for modeling but then when your predictions come back you need to know what one represents so you need to reverse that transform and turn one back into dog often those transforms also need data driven setup for example in that example of dog becoming one there needs to be something that actually creates that vocab automatically recognizing what are all the possible classes so it can create a different index for each one um and then apply that to the validation set um and quite often these transforms also have some kind of state such as the vocab so we built this uh bunch of stuff that builds on top of each other at the lowest level is a class called transform which is a callable which also has a decode uh does the type retention higher kind of type thing and does the dispatch over tuples by default so then a pipeline is something that does function composition over transforms and it knows about for example setting up um transforms and like setting up transforms in a pipeline is a bit tricky because you have to make sure that at each level of the pipeline only the previous steps have been applied before you set up the next step and so it does little things like that um and then we have something that applies a pipeline to a collection to give you an indexable lazily transformed collection and then you can do those in parallel to get back you know an independent independent variable for instance and then finally we've built a data loader which uh we'll apply these things uh in parallel uh and create uh collated batches um so in the end all this stuff makes a lot of things much easier for example the language model data loader in fasta v1 was like pages of code in tensorflow it's pages of code in fasta v2 it's less than a screen of code by leveraging these you know these powerful abstractions and foundations so then finally um and again this is something i think swift will be great for we worked really hard to make everything extremely well optimized so for example pre-processing and natural language processing we created a parallel generator in in python which you can then basically pass a class to that defines some setup and a call and it can automatically parallelize that so for example tokenization is done in parallel in a pretty memory efficient way excuse me perhaps the thing i'm most excited about both in python and swift is the optimized pipeline running on the gpu so all of the um pretty much all of the transforms we've done um can uh and and by default do run on the gpu uh so for example when you do the flip left right i showed you earlier we'll actually run on the gpu as well warp as well zoom as well even things like crop um so one of the basics of this is the affine coordinate transform um which uses affine grid and grid sample which are very powerful pytorch functions which would be great things to actually write in script for tensorflow's new meta programming because they don't exist in tensorflow or at least not in any very complete way but with these with these basic ideas we can create this affine coordinate transform that lets us do a very wide range of data augmentations in parallel on the gpu for those of you that know about the dali library that nvidia's created this provides a lot of the same benefits as dali it's pretty similar in terms of its performance but the nice thing is uh all the stuff you write you write it in python not encoder so with dali if they don't have the exact transformation you want and there's a pretty high chance that they won't then you're stuck or else with fast ai v2 uh you can write your own in a few lines of python you can test it out in a jupyter notebook it makes life super easy so this kind of stuff um you know i feel like because swift is you know a much faster more hackable language than than python or at least hackable in the sense of performance i guess not as hackable in terms of its type system necessarily um you know i feel like we can kind of build even more powerful foundations and pipelines and you know like a a real swift for tensorflow computer vision library you know leveraging the meta programming and leveraging swift numerics um stuff like that i think would be super cool so
Info
Channel: Jeremy Howard
Views: 10,362
Rating: 4.9433427 out of 5
Keywords: deep learning, fastai
Id: bHVqO5YyNbU
Channel Id: undefined
Length: 33min 23sec (2003 seconds)
Published: Fri Aug 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.