The story of Fast.ai & why Python is not the future of ML with Jeremy Howard

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] you're listening to gradient descent a show where we learn about making machine learning models work in the real world i'm your host lucas bewall jeremy howard created the fastai course which is maybe the most popular course to learn machine learning and there are a lot out there he's also the author of the book deep learning for coders with fastai and pytorch and in that process he made the fastai library which lots of people use independently to write deep learning code before that he was the ceo and co-founder of analytic an exciting startup that applies deep learning to healthcare applications and before that he was the president of kaggle one of the most exciting earliest machine learning companies i'm super excited to talk to him so jeremy it's nice to talk to you and in preparing the questions i kind of realized that um every time i've talked to you there's been kind of a few gems that i've remembered that i would never think to ask about like one time you told me about how you learned chinese and another time you gave me um dad parenting advice like very specific advice that's been actually super um helpful so it was kind of funny hey tell me what what dad parenting advice worked out well what you told me was um when you change diapers use a blow dryer to change a um a really frustrating experience into like a really joyful experience and it's like such good advice i don't know how you i guess i can imagine how you thought of it but it's yeah yeah no they love the whooshing sound they love the warmth i'm kind of obsessed about dad things so i'm always happy to talk about bad things that is this podcast can we start with that now now that my daughter's eight months old do you have any any suggestions for this oh my goodness eight months old you know it's like the same with any kind of learning it's all about consistency so i think that the main thing we did right with claire was just you know this delightful child now is we were just super consistent like if we said like you can't have x unless you do y we would never do x you know give her x if you didn't do y and if we're like if you want to take your scooter down to the bottom of the road you have to carry it back up again we read this great book that was saying like if you're not consistent it becomes like this thing like it's like a gambler it's like sometimes you get the thing you want so you just have to keep trying so that's my number one piece of advice it's the same with like teaching machine learning we always tell people that tenacity is the most important thing for a student it's like to stick stick with it do it every day i guess just in the spirit of questions i'm genuinely um curious about you know you've built this um you know kind of amazing framework and and sort of teaching thing that i think is maybe the most popular and most appreciated framework i was wondering if you could you could start by telling me the story of what inspired you to do that and what was the the kind of journey to making you know fast ai the curriculum and fast ai the yeah ml framework so um it was something that my wife rachel and i started together um and um so rachel has a math phd super technical background early data scientists and engineered uber i don't you know i i have a just scraped by a philosophy undergrad and have no technical background but you know from both of our different directions we both had this frustration that like neural networks in 2012 super important clearly gonna change the world but super inaccessible and you know so we would go to meetups and try to figure out like how do we like i knew the basic idea i'd coded neural networks 20 years ago but like how do you make them really good there wasn't any kind of open source software at the time for running on gpus you know dan sirisson and jurgen schmidt who his thing was available but he had to pay for it there was no source code and we just thought oh we've got to change this because the history of technology leaps has been that it generally increases inequality because the people with resources can access the new technology and then that leads to kind of societal upheaval and a lot of unhappiness so we thought well we should just do what we can so we thought how how are we going to fix this and so basically the goal was and still is be able to use deep learning without requiring any code so that you know because the vast majority of the world can't code um we kind of thought well to get there we should first of all see like well what exists right now learn how to use it as best as we can ourselves teach people how to best use it as we can and then make it better which requires doing research and then turning that into software and then changing the course to teach the hopefully slightly easier version and repeat that again and again for a few years um and so that's we're kind of in that process that's so interesting do you worry that um the stuff you're teaching you're sort of trying to make it obsolete right because you're trying to build higher level abstractions like i think one of the things that people really appreciate your course is the sort of really clear in-depth explanations of how these things work do you think that that's eventually going to be not necessary or how do you think about that yeah um to some extent i mean so if you look at the the the new book and the new course um the the chapter one starts with like really really foundational stuff around like what is a machine learning algorithm what what do we mean to learn an algorithm what's the difference between traditional programming and machine learning to solve the same problem and those kinds of basic basic foundations i think will always be useful even at the point you're not using any code i feel like even right now if somebody's using like platform ai or some kind of code free framework you still need to understand these basics of like okay an algorithm can only learn based on the data you provide you know it's generally not going to be able to extrapolate to patterns it's not seen yet stuff like that but yeah i mean um we have so far released two new courses every year you know a part one and a part two every year because every year it's totally out of date and we always say to our students at the start of part one look you know none of the details you're learning are going to be of any use in the year or two's time there's a good you know when we're doing thiano and then tensorflow and keras you know and then playing pie torch we always say look don't worry too much about the software we're using because none of it's still any good you know it's goal changing rapidly you know faster than javascript frameworks but the concepts are important and yeah you can pick up a new library in i don't know awake i guess do you um it seems like you've uh you've thought pretty deeply about um learning both you know human learning and and machine learning had you had um had you or rachel had practice teaching before was this kind of your first teaching experience um you know i've actually had a lot of practice teaching of this kind but in this really informal way partly it's because i don't have a technical educational background myself so i found it very easy to empathize with people who don't know what's going on because i don't know what's going on and so way back when i was doing management consulting you know 25 years ago i was always using data driven approaches rather than expertise and interview driven approaches to solve problems because i didn't have any expertise and i couldn't really interview people because nobody took me seriously because they're too young so and so then i would like have to explain to my client and to the engagement manager like well i solved this problem using this thing called linear programming or multiple regression or a database or whatever and yeah what i found was i very i wouldn't say very quickly but within a couple of years in consulting i started finding myself like running training programs for what we would today call data science but 20 something years before we were using that word yeah basically teaching our client and uh you know so when i was at eighty carnie i ran a course for the whole company basically that every uh associate nba had to do in what we would today call data science you know a bit of sql a bit of regression a bit of spreadsheets bit of monte carlo so yeah i've actually done quite a lot of that now you mention it and uh certainly rachel also um uh but uh for her on um pure math you know so she she ran some courses at duke university and stuff for post grads so yeah i guess we both had some some practice and we're pretty passionate about it so we also um study the literature of how to teach a lot which most teachers weirdly enough don't so so that's good do you have do you feel like um there are things that you feel like uniquely proud of in in your teaching or like things that you're doing particularly well compared to um you know other classes that people might take yeah i mean that i wouldn't say unique because there's always other people doing good stuff you know i think we're notable for two things in particular one is um code first and the other is top down so you know i make a very conscious decision in kind of everything i do to focus on myself as the audience i'm not a good mathematician you know i'm like i'm i'm i'm capable nowadays but it's not something that's really in my in my background and doesn't come naturally to me for me the best explanation of a technical thing is like an example in some code that i can run debug look at the intermediate inputs and outputs so so i make a conscious decision in my teaching to to teach to people who are like me and although most people at kind of graduate level in technical degrees and not like me they've all done a lot of math most people that are interested in this material are like me they're people who don't have graduate degrees and they're really underrepresented in the teaching group because like nearly all teachers are academics and so they can't empathize with people who don't love greek letters you know and integrals and stuff so yeah so so it's so i always explain things by showing code examples so and then the other is top down which is again the vast majority of humans not necessarily the vast majority of people who have spent a long time in technical degrees and made it all the way to being professors but most regular people learn much better when they have context why are you learning this what's an example of it being applied you know what are some of the pros and cons of using this approach before you start talking about you know the details of how it's put together so and we this is really hard to do but we try to make it so that every time we introduce a topic it's because we kind of need to show it in order to explain something else or in order to improve something else and this is so hard because obviously everything i'm teaching is stuff that i know really well and so it's really easy for me to just say like okay you start here and you build on this and you build on this and you build on this and here you are and that's that's just the natural way to try to teach something but it's not the natural way to to learn it so i i don't think people realize how difficult top-down teaching is but um people seem to really appreciate it yeah they do seem to really appreciate it do you think um i mean love to talk to rachel about this directly but do you think rachel has the same approach as you because it sounds like she has a pretty different background yeah she does have a different background um but she certainly has the same approach because we've talked about it and she we both kind of kind of jump on each other to say like hey you know because we kind of do a lot of development together or we did before she got onto the data ethics stuff more um and sometimes you know i'll say to her like hey that seems pretty bottom-up don't you think and she'll be like oh yeah it is damn it start again you know so we both know it's important and we both try really hard to do it but we don't always succeed and can you tell me about the um the library that you built like how that came about do you think it was necessary to do it to teach the way you wanted to well it's not it's remember the purpose of this is not teaching so we want there to be no teaching so the goal is that they're all minimal teaching the goal is that there should be no code and it should be something you can pick up in half an hour and get going so the fact that we have to teach what ends up being about 140 hours of work is a failure you know we're still failing um and so the only way to to fix that is to create software which makes everything dramatically easier um so really the software is that's actually the thing that's actually our our goal um but we we can't get there until you know we first of all teach people to use what already exists and to do the research to figure out like well why is it still hard why is it still too slow why does it still take too much compute why does it still take too much data like what are all the things that limit accessibility do the research to try and improve each of those things a little bit okay how can we kind of embed that into software yeah the software is kind of the end result of this i mean it's still a loop but eventually hopefully it'll all be in the software and i guess we've gotten to a point now where we feel like we understood some of the key missing things in deep learning libraries at least we're still a long way away from being no code but we at least saw things like oh you know basic object-oriented design is basically is largely impossible because tensors don't have any kind of semantic types so let's add that and see where it takes us and you know kind of stuff like that we really tried to get back to the foundations were there any other ones that was a that was a good a good one any any other that come to mind yeah i mean um you know i mean dispatch is a key one so the fact that um kind of julia style dispatch is not built into python um so function dispatch on typed arguments we kind of felt like we had to fix that because really in for data science the kind of data you have impacts what has to happen and so if you say rotate then depending on whether it's a a 3d ct scan or an image or a point cloud or a set of key points for a human pose rotate semantically means the same thing but requires different implementations um so yeah we built this kind of julia inspired type dispatch system also like realizing that to go with again it's really all about types i guess when you have semantic types they need to go all the way in and out by which i mean you put an image in it's a pillow you know image object it needs to come all the way out the other side as you know an image tensor go into your model the model then needs to produce an image you know uh an image tensor or a category you know type or whatever and then that needs to come out all the way the other side to be able to be displayed on your screen correctly so we had to make sure that the entire transformation pipeline was reversible so we had to set up a new system of um reversible composable transforms um so this stuff is all like we as much as possible we try to hide it behind the scenes but without these things our eventual goal of no code would be impossible because um you know you would have to tell the computer like oh this tensor that's come out actually represents you know three bounding boxes along with associated um categories you know and describe how to display it and stuff so it's all pretty foundational to both making the process of coding easy and then down the track over the next couple of years you know removing the need for the code entirely and what did you um like what was the big goal behind releasing a v2 of the library that was kind of a bold choice right to to just make a complete rewrite yeah i'm um you know i'm a big fan of second system you know the kind of the opposite of of joel spolsky you know i i i love rewriting i'm more i mean i'm no arthur whitney but you know arthur whitney who created k and kdb um uh every version he rewrites the entire thing from scratch um and he's done many versions now um that's that's i really like that as a general approach which is like if i haven't learned so much that my previous version seems like ridiculously naive and and pathetic then i'm i'm not moving forwards you know so i do find every year i look back at any code i've got and think like oh that could be so much better and then you rewrite it from scratch i did the same thing with the book you know i rewrote every chapter from scratch a second time so it's partly that and it's partly also just that it took a few years to get to a point where i felt like i i actually had some solid understanding of what was needed you know the kind of things i just described um and some of a lot of it came from like a lot of conversations with um chris lattner the the inventor of swift and llvm um so when we taught together um it was great sitting with him and we're talking about like porting ai to swift and like the type system in swift and then working with um alexis gallagher who's like maybe the world's foremost expert on the on swift's value type system and he helped us build a new data block api for swift and so kind of through that process as well it made me realize like yeah you know this is um this is actually a real lasting idea and actually i should mention it it goes back to the the very idea of the data block api which actually goes back to fastao version one which is um this idea that and again it's kind of based on really thinking carefully about the foundations which is like rather than have a a library which every possible combination of inputs and outputs ends up being this totally different class you know with a different api and different ideas let's have some types that represent that could be either an input or an output and then let's figure out the actual steps you need it's like okay you've you know how do you figure out what the input items are how do you figure out what the output items are how do you figure out how to split out the validation set how do you figure out how to get the labels um so again these things are just like yeah we you know came to them by stepping back and saying what is actually foundationally what's going on here and let's do it properly you know so fast ai too is really our first time where we just stepped back and you know literally i said um you know so silva and i worked on it and i said to silver like we're not gonna push out any piece of this until it's the absolute best we can make it you know right now um which i know silva i kind of got a bit you know filled i was a bit crazy sometimes like the the transformed api transforms api i think i went through like 27 rewrites um but you know i kept thinking like no this is not good enough no this is not good enough you know um until eventually it's like okay this is this is actually good now so is the hardest part the um the external apis then because that does seem like it'd be really tricky to to make that i mean that seems like an endless task to make these apis like clear enough and organized well they're never um i never think of them as external apis to me they're always internal apis they're what i mean because you want to make a bigger system yeah what am i building the rest of the software with exactly and you know we went all the way back to like thinking like well how do we even write software you know i'm a huge fan i've always been a huge fan of the idea of literate programming but never found anything that made it work and you know we've been big proponents of jupiter notebook forever um and it was always upsetting to me that i had this like jupiter world that i loved being in and this like ide world which i didn't have the same ability to explore in a documented reproducible way and incorporate that exploration and explanation into the code as i wrote so yeah we went all the way back and said like oh i wonder if there's a way to actually use jupyter notebooks to create an integrated system of documentation and code and tests and exploration um and it turns out the answer was yes so yeah it's really like just going going right back at every point that i kind of felt like i'm less than entirely happy with the way i'm doing something right now it's like to say okay can we fix that can we make it better and python really helped there right because python is so hackable you know the the whole the fact that you can actually go into the meta object system and change how type dispatch works and change how inheritance works so like our type dispatch system has its own inheritance implementation built into it it's yeah it's amazing you can do that wow why um because um the type dispatch system needs to understand inheritance when it comes to how do i decide if you call a function on a and b that you know on types a and b and there's something registered for that function which has some superclass of a and some higher superclass of b and something else with a slightly different combination how do you decide which one matches you know um so in the first version of it i i ignored inheritance entirely and it would only dispatch if you had the types exactly matched or one of the types was none but then later on i added yeah i added inheritance so now you can you've got um this nice combination of multiple dispatch and inheritance which is really convenient isn't um can you give me some examples of how the inheritance works with your types because i would think it could get kind of tricky like what's even inheriting from what and the types that just quickly come to mind um for me like if you have an image with multiple bounding boxes would that inherit from like just a raw image yeah so generally those kind of things will compose you know so um we uh i don't think we ever use multiple inheritance um i try to stay away from it because i've always found it a bit hairy so instead things tend to be a lot more functional so you know a black and white image inherits from image and i think a dicom image which is a medical image also inherits from image and then there are transforms with the type signatures which will take an image and then there will be others which will take a dicom image and so if you call something with a dicom image for which there isn't a registered function that takes a dicom image but there is one that takes an image it'll call the image one um and so and then we kind of use a woe there so in ways where you know there'll be a kind of um we use a lot of duck typing so there'll be like a you know call dot method and dot method can be implemented differently in the various image subclasses um and some the other thing you can do with our type dispatch system is you can use a tuple of types which means that that function argument can be any of those types so you can kind of create union types on the fly which is pretty convenient too are there parts of in the v2 that you're still not happy with or were you really able to realize that vision of there are still some parts yeah there um partly that happened kind of because of covert um and um you know i unfortunately found myself the kind of face of the global masks movement um which didn't leave much room for more interesting things like deep learning um so some of the things that we kind of added in towards the end like um some of the stuff around inference is still a little possibly a little clunky um but you know it's only a it's only it's only some little pieces like i mean on the whole inferences is pretty good but for example i didn't really look at it at all at um you know how things would work with on x for example so kind of mobile or highly scalable serving also the the training loop needs to be a little bit more flexible to handle things like um the hugging face transformers api makes different assumptions that don't quite fit our assumptions um tpu training because of the way it like runs on this separate machine that you don't have access to you kind of have to find ways to do things that um have that except really high latency and so like for tpu we kind of um it's particularly important because uh we've built a whole new computer vision library that runs on the gpu or runs in pi torch you know which generally is targeting the gpu and uh pytorch has a pretty good gpu launch latency along with a good nvidia driver so we can do a lot of like stuff on the gpu around transformations and stuff um that all breaks down with tpu um because like every time you do another thing on the tpu you have to go through that whole nasty latency so yeah there's a few little things like that that need to be improved is it important to you that your library is used um widely outside of a learning context like is it is one of your goals to make it kind of widespread in production systems yeah yeah yeah i mean because the the learning context hopefully goes away eventually hopefully there will be no fast ai course and it'll just be software so if people are only using our software in a learning context it won't be used at all um yeah we want it used everywhere or something like it i mean i don't care whether it's fast ai or if somebody else comes along and creates something better we just want to make sure that deep learning is is accessible that's super important and the funny thing is um because deep learning is so new and it kind of appeared so quickly a lot of the decision makers even commercially are people that are highly academic um and the whole kind of academic ecosystem is really important much more so than any other field i've ever been in um so one of the things we need to make to is make sure that researchers are using fast ai so we try you know and we're researchers too so we try to make a very researcher friendly and that's one of the key focuses really at the moment does that sorry i mean i would think just naively like making something research friendly would involve kind of the opposite of of making it like a single clean api like it or like uh you know abstracting away all the details like i would think researchers would want to really tinker with the the low-level assumptions yeah well that's why um that's why you need a layered api because the first thing to realize is it's getting to the point now or maybe it's at the point now where most researchers doing research with deep learning are not deep learning researchers you know they're um proteomics researchers or genomics researchers or animal husbandry researchers or whatever you know or astrophysicists you have not heard that i i was the keynote speaker at a couple of years ago the major international animal husbandry congress i got a nice trip to auckland with a family it was very pleasant in fact um hadley wickham's father organized it and he invited me yeah well i'm sorry i cut you off you're making an interesting point that i interrupted for no reason i didn't know that you were so ignorant about animal husbandry lucas i'm disgusted dude i love i love all the unusual use cases of deep learning it's definitely something i collect but that's i have not heard that one um yeah so um sorry where were we we were talking about um oh yeah researchers so you're doing research into a thing right so like i don't know maybe it's like you're trying to find a better way to do um gradient accumulation for fpe 16 training or maybe you're trying a new activation function or maybe you're trying to find out whether um you know this different way of handling four channel input works well for you know hyperspectral satellite imagery or whatever and so you you know the idea is to let you focus on that thing and not all the other things but then you want all the other things to be done as well as possible because if you do a shitty job of all the other things then you might say like oh my activation function's actually really good but then somebody else might notice it like oh no it was just throwing like a it was just doing a kind of a crappy version of data augmentation effectively because if we add dropout then your thing doesn't help anymore um so with a layered api you can use the high level easy easiest bits with like all the defaults that work nicely together and then you just pick the bit that you want and delve in as deep as you like so there's kind of really four layers uh key layers in our api so maybe you'll go in and create a new data block or maybe you'll go and create a new transform or maybe you'll go and create a new callback so like the thing about fastai is it's actually um far more hackable than um say keras right being take take what i'm very familiar with so like with keras you kind of have this um pretty well-defined transformation pipeline or tf.data if you're using that pretty well-defined set of atomic units you can use and if you want to customize them you're kind of out of luck you know often it requires going and creating a new tf up in c plus plus or something so it really helps using pi torch they kind of provide these really nice low latency primitives and then we build out everything out of those latency primitives and we kind of gradually layer the apis on top of each other and we make sure that they're very well documented all the way down so you don't kind of get to a point where it's like oh you're now you're now in the internal api good luck it's like no it's all external api and it's all documented and it all has tests and it all has examples and it all has explanations so you can put put your research in at the point that you need it i see but i guess when you talk about academics then or researchers sorry not academics you're you're imagining like actual machine learning researchers researching on machine learning itself versus like an animal husbandry researcher who needs an application of machine learning i guess you're speaking to both yeah yeah both and so i mean it's much easier for me to understand the needs of ml researchers because that's what i do and that's who i generally hang out with um but there's a lot of overlap like i found back in the days when we had conferences that you could go to um you know as i walked around europe's a lot of people would come up to me and say like oh i just gave this talk i just gave this poster presentation and three years ago i was a fast ai student before that i was a meteorologist or a astrophysicist or neuroscientist or whatever and you know i used your course to understand the subject and then i used your software and then i brought in these ideas from astrophysics or neuroscience or whatever and now i'm here i am presenting them in europe's and so there's kind of like this yeah really interesting overlap now between the worlds of ml research and domain expertise in that increasingly domain experts are becoming you know pretty well loaded and well-respected ml researchers as well because you kind of have to be you know like if you want to do a real kick-ass job of medical imaging for instance there's still a lot of foundational questions you have to answer about like how do you actually deal with large 3d volumes you know it's still these things are not solved and so you do have to become a really good deep learning researcher as well you know i think one of the things that that i always worry about for myself is kind of um you know getting out of date like i remember being in my early 20s and looking at some of the you know the tenured professors that were my age now and thinking boy you know they've just not stayed current in the state of um machine learning and then you know i started a company and i kind of you know realized that um you know i actually wasn't staying you know up to date myself and you know kind of often stuck in like older techniques that i was more comfortable with or like languages i was more comfortable with and yeah i feel like one of the things that you do just phenomenally well from at least from the outside is is staying kind of really current and on top of stuff yeah i wonder if you have any thoughts on how you do that because i well i mean i gotta say i really admired what you did with um moving away from from your world of crowdsourcing into into deep learning and i think you took like a year or so just to figure it out right not many people do that you know and and i think a lot of people assume they can't because um if you get to i don't know your mid 30s or whatever and you haven't learned a significant new domain for the last decade you could easily believe that you're not capable of doing so so i think you kind of have to do what you do which is just to decide to do it i mean for me i took a rather extreme decision when i was 18 which was to make sure i spent half of every day learning or practicing something new for the rest of my life which i've stuck to certainly on average nowadays it's yeah nowadays it's more like 80 yeah i mean it's um so for me i mean it's weird my brain still tells me i won't be able to understand this new thing because i start reading something and i don't understand it straight away and my brain's like okay this is too hard for you so you kind of have to push through that um but yeah for me i kind of had this realization you know as a teenager that um learning new skills is this high leverage activity um and so i kind of hypothesized that if you keep doing it for your whole life like i noticed nobody did like nobody i knew did i thought well if you did wouldn't you get this kind of like exponential returns and um so i thought i should try to do that so that's that's kind of my in my approach unless you reasoned your way into that choice that's amazing is it is it like a um do you do you have to kind of fight your immediate instincts to do that or is it kind of a pleasure to my instincts are fine now what you do i do have to do is to fight well not anymore not now that i work with my wife um and you know i'm working with sylvan who's super understanding and understood me in a similar but for nearly all my working life fighting or at least dealing with the people around me um because if somebody's like particularly when you're the boss and you're like okay we urgently need to do x and somebody can clearly see that like why the you like using julia for the first time to use x we don't even know julia you could have had it done already if you just used powell or python or some that you already knew and i was like well you know i just wanted to learn julia um so yeah it's like it drives people around me crazy that i'm working with because everybody's busy and it's it's hard to in the moment appreciate that like okay this moment isn't actually more important than every other moment for the rest of your life and so if you don't spend time now getting better at your skills than the rest of your life you're going to be a little bit slower and a little bit less capable and a little bit less knowledgeable so that's the hard bit it also sounds to me like just from the examples that you've given that you have a real bias to learning by doing is is that right like do you also like kind of read papers and synthesize that in a different way yeah but if i read a paper i only read it until i get to the point where i decide it's something i want to implement or not um or that there's some idea that i want to take away from it to implement um yeah so i like i um i find doing things i don't know i'm a very intuitive person so i find doing things and experimenting a lot i kind of get a sense of how things kind of fit together i i really like the way richard feynman talked about his research uh and his understanding of papers was that he always thinks about a physical analogy every time he reads a paper and he doesn't go any further on a paper until he has a physical analogy in mind and then he always found that he could spot the errors and papers straight away by recognizing that the physical analogy would would break down so i'm cut a bit like that i'm always looking for the for the context and the understanding of what it's for and then try to implement it i see so should we expect the next version of fasta to be in a new language have you thought about moving away from your python oh i mean obviously i have because i looked at swift you know um and sadly you know chris latina left google um so i don't know you know they've got some good folks still there maybe they'll make something great of it but but you know um i tend to kind of follow people like you know people who have been successful many times and chris was one of those people so yeah i mean what's next i don't know like it's certainly like python is not the future of machine learning it can't be you know it's it's so nicely hackable but it's so frustrating to work with a language where you can't do anything fast enough unless you you know uh call out to some external coder or c code and you can't run anything in parallel unless you like put it on a whole nother process that like i find working with python there's just so much overhead in my in my brain to try to get it to work fast enough um it's obviously fine for a lot of things but not really in the deep learning world or not really in the machine learning world so like i really hope that julia is really successful because like there's a language with a nicely designed type system and a nicely designed dispatch system and most importantly it's kind of julia all the way down so you can get in and write your gpu kernel in in julia or you can you know the all the basic stuff is implemented in julia all the way down until you hit the llvm sorry this is an embarrassing question julia's kind of like matlab is that what i should be thinking it was designed to be something that matlab people could could use but um no it's more like i don't know like common lisp mates matlab meets python so it sounds a little bit like r maybe um you see ah has some nice ideas but um the you know the r object system this i mean a there's too many of them and b they're all such a hack and then c it's because it's so dynamic it's very slow so again you have to implement everything in something that's not r and r just becomes a glue language on top of it i mean i spent so so many years writing writing r and it's certainly better than what came before but i never enjoyed it um so julia is a compiled language and it's got a rich type system and uh it's entirely based on function dispatch um using the type system it's got a very strong kind of meta programming approach so that's why you can write your cuda kernel in julia for example uh you know it's it's got um auto grad again it's written in julia um so it's it's got a lot of nice features but unfortunately it's um hasn't really got the corporate buy-in yet so it's highly reliant on a kind of this core group of super smart people that that started it and now run julio computing which doesn't seem to have a business model as far as i can tell other than keep getting funding from vcs which works for a while but at some point it stops what is it yes i know what is the fasta business model is there a business model the first ai business model is that i take money out of my bank account to pay for things i need and that's about it awesome well you know we always end with two questions i want to make sure we have time for that to have a little bit of consistency here um and the first one is um you know when you when you look at the different topics and um you know kind of machine learning broadly defined is there a topic that you think that people should pay a lot more attention to than they generally are paying attention to yes um and i think it's the world of deep learning outside of the area that you're familiar with um so for example when i got started in nlp i was shocked to discover that nobody i spoke to in the world of nlp had any familiarity with the last three or four years of development in computer vision um the idea of like transfer learning for example and how incredibly flexible it was um so that's what led to ulm fit um which in turn led to gpt which in turn led to gp2 and before ulm fit happened every nlp researcher i spoke to i said like what do you think you know about this idea of like super massive transfer learning from language models and everybody i spoke to in nlp said that's a stupid idea and everybody i spoke to in computer vision said yes of course i'm sure everybody does that already so yeah i think in general people are way too specialized in deep learning and there's a lot of good ideas in other parts of it interesting cool um and then our our final question we always ask and i kind of wonder you'll have an interesting perspective on this you know typically we're talking to people that are um trying to you know use machine learning model for some purpose like animal husbandry but you've sort of seen this wide range of applications and um when you look at when you look across the things that you've seen kind of go from like ideation to like deployed thing that's working and useful where do you see the the biggest bottleneck i mean the the projects i've been involved in throughout my life around machine learning have always been successfully deployed you know so i kind of get frustrated with all these people who tell me that machine learning is just this abstract thing that no one's actually using um i think a big part of the problem is there's kind of people that understand business and logistic and process management there's kind of people that understand ai and algorithms and data and there's not much connectivity between the two so like i spent 10 years working as a management consultant so all my life was logistics and business processes and hr and all that stuff you know it's kind of hard to picture as a management consultant i think he must have been a surprising consultant i tried to fake it as best as i could um for sure i've noticed a lot of people in the kind of machine learning world really under-appreciate the complexity of dealing with constraints and finding opportunities and disaggregating value chains or they'll do the opposite they'll just assume it's so hard that it's impossible without realizing there's like you know large groups of people around the world who spend their lives studying these questions and finding solutions to them so i think in general i'd love to see better cross-disciplinary teams and more people on the kind of the mba side developing kind of ai skills and more people on the ai side kind of developing understanding of business and teams and all that i guess i guess you have this broad view um you know from your background you know and and you've watched these ml projects kind of get deployed and useful so i guess like maybe the question is like more like like were there points that sort of surprised you with their their level of difficulty just to kind of move through it like did you have like mishaps where you um you know you thought the model was working and then when it was deployed into production it didn't you know it didn't work as well as you were hoping or thought it would no not at all um and i know that sounds weird but um it's just you know even a small amount of background in like doing the actual work that the thing you're building is meant to be integrating with you know i i spent 10 years uh eight years working on an insurance pricing business entirely based on operations research and machine learning but before that you know the last four or five years of my management consulting career was nearly entirely in insurance so you know there's not much very surprising that that happens i know i know the people i know the processes and so that's why i think like i would much rather see i don't know like if somebody's going to do a a paralegal ai business i'd much rather see a paralegal do it than an ai person do it or if they're gonna do a like you know hr recruiting ai business i'd much rather see someone with an hr recruiting background do it like it's it's super difficult like there's just no way to understand an industry really well without doing that industry for for a few years and what would you so like you know because i know some of these people and i get this question all the time i'll channel a question that i'm sure is in people's heads watching this so if you are that that you know paralegal who's starting you know a paralegal ai enabled business how would you do the ai part um well obviously i would take the first ai courses i mean i would i mean seriously i would make sure i was good at coding you know i'd spend a year working on coding um and yeah i mean the fast ai courses are absolutely designed for for you and i would be careful of bringing on a so-called ai expert until you've had a go at doing it all yourself because i found like most people in that situation for obvious reasons feel pretty intimidated by the ai world and kind of a bit humbled by a little bit overwhelmed by it and they'll bring on you know a self-described expert they have no ability to judge the expertise of that person so they end up bringing somebody who's just good at projecting confidence which is probably negatively correlated with actual effectiveness so yeah do it do it yourself for for a year build the best stuff you can i do find a lot of fast ai alarm with kind of backgrounds of domain experts are shocked when they then get involved in the world of ai experts and they find they're much better at training models that actually predict things correctly than the modeling experts are i'm sure you've had that experience as somebody who you know like me doesn't have a technical background in this area yeah well thank you so much this is a this is super fun and educational for me thank you very much for having me yes my [Music] pleasure you
Info
Channel: Weights & Biases
Views: 39,360
Rating: 4.9630871 out of 5
Keywords: machine learning, deep learning, robotics, hyperparameter optimization, hyperparameter tuning, track experiments, keras, scikit, pytorch, weights, biases, gpu, artificialintelligence, ai, datascience, python, bigdata, data, iot, tech, programming, coding, datascientist, mlops, ml ops, lukas biewald, fastai, jeremy howard
Id: t2V2kf2gNnI
Channel Id: undefined
Length: 51min 9sec (3069 seconds)
Published: Tue Aug 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.