Becoming The Youngest Kaggle Grandmaster | ML For Japanese Literature | "Anokas": Mikel Bober-Irizar

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey this is a mutiny and you're listening to chai time data science a podcast for data science enthusiasts where I interview practitioners researchers and calculus about their journey experience and talk all things about data science [Music] hello and welcome to another episode of the chattering redesign show in this episode I interview Carol Grandmaster Mikael also known as Anika's on Kaggle and if you from any slack community where Michael is active on you might have seen him with the same username in this interview I try to decipher for the second time if Mikael is a robot a ga or not which I again fail at but do check out the previous interview it will be linked in the description of this podcast where I was fortunate enough to interview him on my blog series earlier in this interview we talked all about Mikael journey into kaggle machine learning and his current life where he's a computer science student at the University of Cambridge yes he's 18 years old at the time of publishing this interview Mikael was one of the youngest person to become kaggle Grandmaster in the competition sphere I believe at the age of 70 he was also the first person I believed to become chiral triple master back when there were three tiers for the audience of datasets there was recently introduced on capital and earlier they were just three tiers we talk all about how Michael went on cackling at the age of 14 how he continued to learn continue to improve his score on the leaderboard his positions on the leaderboard and his journey into Kaggle into machine learning into even machine learning research where he liked to work working on my very interesting projects in the intersection of machine learning and Japanese literature also where he ended up organizing a kegel competition which we also talked about in this interview I'm sure that Michael is one of definitely one of the youngest and smartest persons I've had on the Seas for the second time so I really enjoyed this conversation a reminder to the non-native English speaking audience the subtitles on YouTube will be manually checked and we applaud it so please enable them for a better experience and the blog for this interview along with all of the Chatham data science interviews will be released later so you can also find the links to where these will be posted in case you want to read the previous or the future interviews without further ado here's my interview with Cagle still youngest cattleman master Mikael also known as Anika's on cagney please enjoy the show [Music] hi everyone I am on the call with a robot or a GA or I'm not sure if it's a proxy Mikkel thank you so much for joining me on the chai time designers I thanks for having me it's it's great to finally be here we were just talking about this but you know very well that I have been a fan of yours so it's again a privilege to have you or I don't know if it's a proxy of yours on the show I I say it's me but I guess that's what someone else would say so you would just have to trust me on this one I guess you passed the Turing test in that case yeah you decide at the end okay for the listeners can you tell us your current age and the age at which you became a grandmaster so I'm 18 now so I started cackling when I was like 14 I think and I married grandmaster last year so 17 and I would want to confirm is it still illegal to team up with you would that which I label thankfully not know I get ice I admit I've never looked into child labor laws but I'm pretty sure that everyone I've teamed with is fine in a previous interviewed mention you picked up Cargill because you initially were drawn to the competitive aspect of it when you always competitive any in place outside of Cargill at the age of 14 that you involved in I guess like no not that much I didn't really see myself as a very competitive person I think what what made Chi also addictive is that so I would always do as often people do I had like lots of various personal projects and stuff like I would build stuff for fun right and with machine learning it was like cool I get to build stuff for fun but you know then it's actually like contributing like it's actually on a leaderboard somewhere it's not like right I build something and then it like goes in the folder somewhere it goes on the shelf and I can never touch it again so like the feedback loop of having the leaderboard and like having like goals to reach I think it makes it much easier to stick to something because I would always I was always someone who and I think still am someone who like wants to do lots of things but then I will only like do something for a few months and then I'll get bored of it and I always thought that would be the case with Carol so I was always thinking like oh is this for like because I sometimes OD breaks but I wouldn't do it for three months and I was like oh is this it am I like moving on to something else now but so far I think just the competitive aspect and the community and everything is sort of kept me in this it's so I've stuck to it at what point you do realize that machine learning is your long-term passion something that you'd like to pursue for the longer term I don't know to be honest I'm I still can't say that for sure I still think that like you know one day I'm gonna be bored of it and I'm gonna do something else but then it's like well now I have I have this like Tiger profile and I have all of this and it's got me into so many different opportunities and so I'm thinking like well can I leave it behind if I want to like can I leave so much behind so there's a bit of there's a bit of pressure on myself to like keep going as a result of that but I don't know i think i think i still enjoy and i heard that like i can stay on this path fair enough and i'm really curious about your methodology in a previous interview again mentioned that you didn't go up and take any courses or didn't pursue any books you didn't read any books from cover to cover you follow the top-down approach can you speak more about it what was it had been starting oh yeah I mean like the question I most often get asked by people is like oh how do you get started of tagil and so on and I I hate this question because I haven't got a good answer to it and I wish I did because probably 50% of the questions I get asked is is like oh you know what course I do so on like I should profess this by saying that like Cagle today the competitions are really different to how they were three four years ago I think it's a lot harder now definitely a lot harder to get started because you have back then everything was you know you you you you have some tabular data set binary classification whenever you stick it in XG Bruce and like almost the same code would work for like across several competitions but now and you get these calm it's so complicated you have to get neural networks to work and you've got these massive data sets and so on so I think it was a lot easier to like break in like with the approach that I took back then which is a bit of a shame I think but basically I started I just sort of stumbled across Carol like I don't know really what it was but I was like oh cool so I signed up and that is needed boards and and I really kind of just started by I said we already had Colonels back then um so I didn't know what I was doing to be honest I I didn't have to code so I had learn Python a couple of years prior but I had never really used it it's sort of like I did like a combline course on it but then I forgot it so I I I did not know what I was doing categorically but there was kernels and I could just copy paste the code and run it and it would get me some somewhere on the leader board and it could take multiple of these kernels and I can open up the submission files in Excel averaged them and I would get a better score so that's basically I started essentially as a script kitty just downloading on people's coach running it I would tweak parameters and like in the HD boost or whatever and like over time I would start to like oh here's like an idea I have like as try and and so basically just doing that and then like the other thing is before I saw read taught myself how to code I started by using in the first few competitions programs like canine and rapid miner and like these sort of like GUI machine learning tools and like the results weren't great like in terms of like if you compared to even SK learn or something it was it's easy to it's easy to use right and you you have and it teaches you the concepts because you get the data if the split cross-validation your answer tree model perhaps you run several of them you average them and so on so it really like got me comfortable and so for the first few like several competitions that's what I did and I would like stay up all night doing that just honestly just like running other people's code even in are as well because the fact that people use quite a bit as well and yeah it's just sort of progressed gradually I just kept going at it because I found it so fun it's actually like oh I can I could you know because I could coop and say quite easily get into like top twenty percent also because most of the people that do a chemical petition like most of the competitors they join they like do one thing they know they're like submit sample submission yeah and that's it so in the way the leaderboard scores like quite inflated in that like minimal effort will definitely get you top fifty percent right so you can sort of do that and I guess I guess where that really changed was the Abbott Oh duplicate ads competition which was complicated Welling and basically what happened there was that um so the come to give a bit of background about the competition basically what you had to do is you've given a bunch of listings from a website called habit au which is a bit like a Russian version of eBay I'm told okay and so it's you basically you're given two listings and you have to check you have to say you are they duplicates so you had text they tell you add metadata you had images this is back when there wasn't there weren't really images on cattle that wasn't really a thing so there are lots of people like this was the being a the competition I started as soon as it started and people were building stuff based on text features right and but no one was using images and in the first couple days but I quickly wrote up something which basically just hashed all the images and I'd read I created a new feature which is like are these the same image like do they have the same image these two listings which is obviously quite a powerful feature but as the first person to actually process all the images because it's like 100 gigs or something I ended up number one on the leader board I remember was like a really huge thing for me and then I made the great decision of teaming was the first time I've teamed with second and third place on the leader board at that time and that competition was really just such a huge learning opportunity but like this is like a stepping stone from like beginner so like I know what I'm doing and then towards the end of the competition we invited kazanova Marriott's to join us because he was he was he had just started the competition he was quite nerd down but you're like well maybe anyone's a team of us and he did so there was like my first team had I think he was third or second in the world at the time and so that like was like all this capital knowledge bestowed on me basically and I think that's basically how I got into it I don't know how viable is today given like the current sort of climate of competition yeah I do hope I don't know cuz now the feel has progressed and it sort of anyone can like run models on tabular data so sponsors aren't really interested in anymore sponsors like sponsors they want their image or their text problem solved and that everyone years neuro-networks and I think I think it's been a bit of a shame because it it takes away all this old feature engineering aspect of the old competitions so it's definitely different trying to do competitions on Carroll today I would say okay and how did you continue improving your approach again I I don't think you took any courses why do you active as well how did you learn new things how did you go out and figure what is yeah scikit-learn um just like googling as as necessary I think there's a lot of stuff I discovered by trial and error and I think that's why I have I have a pretty good intuition of like what might work or what won't and I think that's just because I didn't go about reading like taking notes like this this model is good for this I like I tried them all on all these different competitions and now I self I haven't I understanding of like I have a good feeling this will work like this won't work and I think I think it's just a lot of people are quite scared to get started because I people have told me like you know oh I don't know if I'm good enough to start cago yet like and I'm and I'm just they're like well there's no like entry exam right there's no there's no risk just just try it and and do do whatever you can and and you will learn from it I don't think I feel like the approach of like Oh before I touch Congo I have to do all these books and horses and stuff I don't think that's the right approach at all basically I'm definitely III think I'd like to call myself a self-made fast evangelist where they really promote this idea of the top-down learning gold yeah yeah so it's great to know like this technique definitely works and this this is one message I always try to get across to the audience that you need to do more than you need to learn especially for money yeah I agree I think one thing I quite like about it is I've never been that strong on on theory in general like with maths and all these other things and I think people often take the wrong approach to machine learning and thinking that like it's super easy in the practical sense it's like super like theory based but it's really not it's all about like how you know sort of getting the intuitions it's all about getting the intuitions and not about understanding equations so I think that's like why it worked so well for me when like I can never do like a mass Olympiad or anything like that right yeah I think it's it's quite a different thing and that's good right yeah yeah what what was your life like when you are doing goggle you in school how many times did your parents walk up to your room late in the night and you're sitting in a corner working on some script in the dark that was quite quite common I did have quite a lot of time to a contango and like I would often I would end up sort of work on it constantly look like I would get up and I would I would always have something running and like every few hours I'd check in and I'd always be SSA Qing from my phone okay and and literally sometimes writing code on my phone the Carroll competition so it was always it was like it was a part of like my day and and my life at time and it was a lot of fun and my friends are very interested in it and always followed me on the leader boards and stuff so it was it was a really great experience being able to set to do that well at school and uh you you still hold the title of becoming the youngest cattle grandmaster are you worried someone might become aware of Cargill at the age of ten and might snatch your tighty yes I I did there are a few people I know who are quite young on cattle close to my age thankfully none of them have taken my place yet babes it's quite cool to see but like I'm not the only one doing this there are other there are definitely quite a few teenagers and cattle and there are a few that are quite high up on the leaderboards and that's that's really cool I just I'm holding on to it for now but then obviously like I'm I will age so eventually I grew me youngest grandmas after find some other gimmick speaking about competitions do you have any favourite battle stories any favorite competitions that you'd like to mention and value once you became used to cackle what would your pipeline look like once you enter the competition because I remember whenever you used to enter competition you would sit on the top of the leaderboard with a huge gap or huge gap - yeah and that would take at least a month sometimes to come over yeah I think I mean like your first question first question is like about like battles I think that have been several first the first one of course avatar so there was we were we were in first place for a long time battling we really liked the final week was was massive and I remember being at school but in the last like couple hours of the competition one of the teams they were like in 7th jumped up above us so we ended up second and they had been hiding their solution for about week they have been making fake submissions to make it seem like they weren't as good as they were so they really like Oh took us and then I guess another similar cases is the competition which which I won which was the Google landmark retrieval competition there was interesting competition in many ways very unusual but we we had managed to get in first and we were staying at first for a long time but the last couple of weeks we were unable to make any progress we were completely stuck and we will lure working really hard because there was a second place he was catching it quickly and just before the competition ends the organisers announced they're extending the competition by a week which was incredibly frustrating because well I mean I had basically not been sleeping the last couple days because I thought like this at the end and it's like actually you've only done half the marathon basically keep please keep running but it was it was really so devastating always and like a race of angry messages on the forums but like that was quite a battle and in the end like the last day they overtook us again on the public meter board but then thankfully on the private leader board we we won and like by a quite a significant margin so I was like really good to see and this competition was like one that I am particularly proud of because our approach we didn't we didn't use the training data at all we didn't have any validation we had there was an image of it was a competition of basically given to more given given a landmark you find all the other images in the data set of the same landmark and there was another competition along with it which hair which was like landmark classification so they had a bunch of pictures of landmarks in each other an ID so from that you could you could use that as a data set as a training set right because you you could say well this bad luck we want to surface all the other landmarks the same ID this is something that like me and my teammates had been working on for quite some time and so we already had like a pipeline for a lot of it we already had trained models which basically took an image and we got a vector and we just used the stuff we already had and we'd built extra stuff on top of it so we build some fast nearest-neighbor code and stuff like that but like ooh ended up happening is that we didn't actually train any models for that competition our models were like generic for like detecting just objects in images so I think that's what made it like really robust and and like I was really proud of our solution just because it's weird to win a camera competition without any training or validation but no that was that was quite I was a fun battle I definitely have the solution link in the description I remember when yeah I'd published it everyone was became a huge fan of a tunnel it was on all of goggles forums slag groups all over the place talking about your approach when you join a competition can you speak more about that and how do you manage to get to the top of the leaderboard in stinky yeah my competition other people were like right I've got this approach I'm gonna like really little about it and I'm gonna download the data I'm gonna set up my script and so on that it's not what I do I the Commission launches I immediately download the data and while the data is downloading I start writing a script okay and basically my approach is always try and get a baseline as quickly as possible so I will I will write like it's just as simple in the past XD blue script but now I guess it's harder the newer competitions to just have a baseline and but often out you use hacks just Trading trailer model on the image size instead of actually doing detecting what's in the image so like I'll try and build like something really simple and that we usually that will get me first in the me the book hopefully for a few days and I guess like I mean I would say the like that's some good strategy for some reason but it's really good just like I like being on the top of the leaderboard and then you read it it's lonely at the top alone every time you get to the top of the leaderboard yeah yeah exactly tango has this like when you when you it has like a button to tweet but it always has these like really mostly passive-aggressive tweets and I just hope that people understand that it's not like me saying that those were the suggested tweets but um yeah it's and like I just try and get like an baseline submission and then like I'll keep iterating on it and not spend like many hours just just try to improve and that will get me like a baseline understanding like I'll get feel for the data and then I can go away and then think ideas think of ideas and stuff I'm like oh not at the time I will only do a competition at the beginning and also fizzle out usually because I'm I'm just not sure what to do to do well but like a lot of the time like when where I do well in competitions it's I have this like really hacky solution that I've kept adding stuff to you and then at a certain point like after after a while I will rewrite it once I know what my solutions are gonna look like then I rewrite it nicely but I just don't see the like either I'm too like lazy or impatient or just I don't think there's there's that much benefit to setting up a pipeline is the first thing you do in a competition so that's that's basically how I try and get to the top at the very beginning among competitions talking about hardware I remember in cattle loops you had posted a picture where you hide around 50 1080 T eyes if I remember correctly how do you justify your carbon footprint and hard we do they come in for people who feel intimidated by that yeah I've quite like hardware and I quite like plastic versions of hardware I don't actually have that's not my hardware to clarify so I often I will help people build servers and so I will buy other parts and I will build service and and stuff furs from people and so often like I will take a photo of all these always GPU boxes that I have but like yeah I guess if you buy like a server from like Dell or if you buy like I guess the the the initial thing was back when Nvidia released back when they released the what was it it was like I come with the name of it but it was this workstation with four and Tydeus like no way back before back when they were first to doing neural networks back when torch was popular not pi torch torch I was they just they just released it was just consumer part it was it was just a standard consumer case cosign case with four original Titans in it and they were charging like 25000 and there was like a big waiting list and so my dad's company was like actually we kind of want one and I was like actually the parts they're not 25,000 they're 5000 so if you buy the parts I will build it for you and that's what we did and it was a success and and it sort of just went from there so like I really like building computers and I know a lot of other people they don't like interfacing with hardware so they will they'll like use different tools like to abstract all their code no he's like different deployment tools or like there's like Neptune and now or and stuff docker but I like I find it really frustrating I like setting up my environment on the hardware and it's like that's my environment and I use of everything and so I've always I've always really liked building computers and working with hardware directly and so on so I guess my sort of photos of hardware that it come comes from that I have in my computer I have one Titan X okay and and yeah so that's that's that's my personal computer but yeah it's I I the upside is I get to like borrow and use lots of other people's stuff and that's definitely helped for competitions what recommendations do you have for people looking at the current climate of Kargil in terms of hardware yes I guess GP right it's the first thing because you need need to cut explain all this image competition so I guess I don't really I I'm not a huge fan of the latest generation of Nvidia cards the twenty series they are quite expensive in that they're the prices being artificially hiked because they can so I I don't like recommending it but I feel like if you if you can afford it then get a twenty atti I think that makes the most sense at the moment I'm hoping I'm hoping that they AMD can come up with some stuff to help bring down invidious prices but I guess the other thing is like on the CPU side I would say get rice and for sure there's just the value proposition is so huge compared to Intel and I like supporting the underdog so I would recommend yeah Rison 28 ET i if you can find one at 1080 TI and because it's like half the price but like 75% of the performance so it's a no-brainer I'd like to add or drop a quick plug I've interviewed Tim dead moose on the series who has a very nice blog on GPUs and we also talked about that so to check that interview out in case you haven't now zooming out to another aspect of your profile you're also published researcher can you tell us about your research interest and what led you to working in research as mmm-hmm yeah so I guess it sort of started I did some work with the University of Surrey on some ko competition so it starts with the YouTube ATM competition and then the Google landmark competition and so they the researchers that I worked with they were like ok well we want to publish a paper based on I think it sounds like amazing actually the the YouTube ATM competition was built there was a workshop at cvpr built around that so we were like okay let's put something in the workshop so I was like cool like I don't know well I don't know how it works but like I'll help out so I did and so in the end I mentioned I ended up going to Hawaii CBPR my first time going to the US and I ended up presenting our paper there which was like such a huge thing it's on it's on YouTube I was like 16 at the time I think and it was such a surreal experience with such an enjoyable one and so I think like being able to do research is such a rewarding thing so like have your name up there and and also like what if someone else like sites your paper you're going to read it you're like wow somebody who actually knows what they're doing read my paper and like it in in some way guided their thoughts or whatever and that's such a cool thing and since then I've been like looking for opportunities so I done a bunch more work with the University of sorry um as I'm going in then like with CA DHS as I'm sure you'll bring up and and and so on and so it's like it's it's a lot of fun and it's a rewarding being just like hagglers to have that opportunity which I know is not everyone has especially before University I remember there was another interesting story I think where you had a school deadline and you almost submitted the paper at the last day for cvpr yeah no this is a common story so like the nice condition I did which was the third youtube a10 competition we did quite well there and we were price and we were we were like they were okay so we have to submit a paper I had just arrived in Cambridge when this happened this is like first week of Cambridge term and we had three days to write a paper so that was that was quite a that was quite a challenge and it didn't end up being as good as I wanted to obviously but honestly and like the amount of like stuff I have to do here at University compares to like how this year was at school is like a completely other level and I'm just we're trying to figure out how to help to basically mix and match all these things and how to make time for it so it's it does happen that like always toggle deadlines and like the landmark competition that got delayed it then coincided with like an exam I had and stuff so it's it's awful but I somehow I managed to work around it I'll definitely have your dog from see we be on LinkedIn Kay's any word monster now could you tell us more about the importance of your work and Japanese literature that I think you've worked along Darden and David Harrelson was in his heart Maru on Twitter yeah yeah so I've been working for the last I guess year and a half with CID eight which is like a Japanese research organization in there working in open data from humanities and I I think it's such a cool thing like I don't have any background in Japanese literature so it's not like it's not like I really wants to go into Japanese literature but it was like wow do you speak Japanese because I think you speak four languages polish English and do more do you speak I don't speak I don't speak Japanese I'd like to especially so actually to and like read some of the the output that we generate but no I Sai I speak I basically just speak English fluently but I speak polish Spanish and Basque sort of I can hold a conversation okay say yeah I'd love I'd love to speak Japanese but I don't you know was I saying so yeah it was it's there was nothing like oh I want to do Japanese literature but it was like here's an opportunity and I knew I knew Tarun from from from before then and so he was like well doing doing like help us out and work on this I was like yeah sure and I'm just saying it's so rewarding to be able to be like put because when you publish a paper it's like okay I've done something I've built a model and it's it's in its online but like you know no one's ever no one's ever going to like actually use it it doesn't actually get used by yeah it's it's it's research right it's it's just just yeah but then like actually being able to like build something that will one day be you to something so like our own goal is to to be able to sort of take all of these millions of books and stuff that have been written in in ancient like cursive Japanese and the thing is that most people in Japan today can't read it you can't read any of these historical documents and there are only like a handful of experts but you have like in your millions of books so these books are never gonna be transcribed into modern Japanese by people there's just too much of them and so like our goal is to be able to sort of automate it and then we'll be able to basically take these like millions of books and put them all online all the transcriptions in modern Japanese and it's sort of just thinking about it like this it opens up so many cool things like been like linguistic analysis because if you have all the books over time you can see how did the language change and all of these things and I just think that like being able to put my skills in some respect in something like concrete and hopefully something that will actually benefit some Society in in in some way is like a huge motivator and so I'm really like happy that I get to work on Japanese literature be sure it will you also ended up hosting a competition on goggle what led you to hosting the competition what parts were you know linen were you not facility ID that you can't go ahead on the leaderboard and be on the fullest position yes so the competition is I'd like a big story about competition so like my part was mainly data preparation building like the data set and the Train test split and stuff along with taran and also basically talking to Cargill and setting up the competition to deciding what we're going to do and so I don't think this has been spoken about before but we had a bit of a problem which is that basically the competition was you have we give you pages with cursive writing on them and you have to like identify like where the characters are and what the characters are so you have to like classic object detection basic the problem is that like these were books that it cost a lot of money basically to get these transcribed so we the data we had was the data we had and these were like transcriptions that we had obtained for other projects and we like their purpose as a cow competition so the issue we had was that some of the books had been transcribed before not in the sense of like our competition format where it was like object detection but in a sense that there was there was a transcription like in another format of the same book and often there would be liberties taken in translation right so you changed words around it wouldn't be a literal like letter for letter translation like ask was but the bottom line was that there was some element of like leakage potentially someone could someone could it would be it wouldn't be easy that someone could find somehow these books and like it would take several steps to find it and then several steps to build a model that you can actually use the information you find and so it was like really wouldn't be easy so I didn't think that the cheating using it would be a problem at all but in the end it was because of that that we sort of had to to downscale the competition so he ended up with a playground competition just because we couldn't risk cheating so we took that decision and at the time I didn't think it was I didn't think there would be that much of an issue like I thought that we were overreacting but recently of course we've all seen the issues around cheating in in find a competition and so that was like someone made a really concerted effort to cheat in that competition and say if I compare that to like how difficult we were to cheat in ours it was like it seemed feasible this that like there they would have done the same in ours so like I do think it's quite important it's getting quite important to like really stop leakage like that and there was that was like a big pain point of the competition and there was things were still like moving last minute and because CAG were actually we weren't aware that some of this data was online and like Carol found it for us and so like we had to put the brakes on a lot of things and cancel the press release last minute and all of these things right it was like crazy but it like behind the scenes like I don't know how much I can say but it was like a lot of drama as a result of this and like all I can say is that it's they're coming so important now I guess careless of entering a mainstream and we never saw this sort of cheating in the past even like a couple years ago with there was like talking data competition which I don't know if you familiar with but basically the the I D wrote like column was correlated with the target things that were later in like the data set were more likely to like have a positive outcome okay and like today this would last like ten minutes before someone finds it because people try and find leakage like this but back then literally it wasn't found until the last week of the competition in person it had been like two months I think that shows like how much it's changed people are like really trying to find leakage and like some people are motivated to cheat and I think that in some ways is new now zooming out to what you are currently doing I don't know why even you're going to university but you ID never Co Cambridge studying computer science can you please tell us why you would need to do that yeah I people still tell me that a lot but like I think what I mentioned earlier was that like I've never been good at the theory side of computer science and so on and the course here at Cambrian is like a really theoretical course a lot of maths and and so on and I actually I find that quite challenging so it's I don't feel that like I'm above everyone else here and of course like it's Cambridge so they're like everyone here is really good but I definitely don't feel that I am like at the top of the pack either I mean like I guess the exception is like right now we're now doing a machine learning module which I think you saw and to be honest like it's a bit disappointing because you have the issue about frankly about Cambridge being so theoretical is that like in some respects the courses are quite far behind like real life so to speak I might I might anger some people by saying that but it's like at least I'm like the machine learning course it's like sentiment classification using naive Bayes how graduated from college I'll confirm that is very true to colleges even today yeah outside of Cambridge at least yeah so it's like there's no like mention of neural networks there's nothing and I feel like a very even a few things but like soon we'll have a lecture on cross-validation and stuff and I know I I know that kind of been there but I know it's gonna piss me off because it's gonna be like it's one of those topics where there's no like accepted like this the best way to do it and I think there's made a lot of progress in a sense of everything on Kaggle is the like the best techniques from cable because they were battle tested so I I I feel like that's what makes it so like useful and I have a feeling that like the stuff we get taught in Cambridge won't hold up it wouldn't hold up on cable for example unlike on real world data do you feel like you're an undercover superhero situation sort of like you're spider-man as soon as the faculty pull out your phone you run scripts on your phone and as soon as they turn back you're back to writing code on people I mean like luckily we are allowed to like use computers and stuff but like our exams at the end of the year we have to write code on paper which is horrifying III don't know how I'm gonna manage but I guess I guess like in in the machine learning lectures like some people and most people don't know but I'd like do a lot machine learning so but like some people know and so they like they didn't like look at me like gee it's not so easy for you online yeah but that's like it's just the it's just certain certain topics it's just the machine learning side and I still think that I have a lot to learn in other areas of computer science I think you're being humble here but talking about your future aspirations Aaron and I under school and auntie underscore not underscore from Twitter question what future job would you like to apply or work on it's I don't know I don't know yeah I think in some ways I'd like let myself off not thinking about it because I have time to think about it but like I feel like at the moment I would love to just upload do something in apply machine learning like industry but where I can actually see results so like I love the work with Cobh because we're actually working towards this this goal and like when a materializes hopefully like be like wow I made this but there's stuff like self-driving cars and this sort of stuff where it's like you know you write code and stuff and then the car moves it's sort of this connection that makes it really really cool so I'd love I think to work in some applied machine learning and thing and like I don't want to narrow my options basically I have I say so like yeah exactly I want I don't really care like in some ways like I'm not saying that I only want to do this right the only thing I would say is that like I want to take like ethical considerations into account there like for example like take something like Boston Dynamics it's like they make really cool robots like I would love to work on robots like that but I know that wherever I build will eventually be used to probably kill someone so unfortunately like I probably wouldn't work on something like that so I think I think there is a duty that like whenever I work on something like is this actually a good thing to build I do think I think about that I think silver Incas still have been a huge milestone in your journey one of the first people in the neurotic detach audacity has and I think you've also internet voyage yeah I did I did like a work experience like informally for a few days at voyage a couple years ago and that was like a really fantastic experience and see I would love to do that again I think now again a question from the email I'm sure I'll have your email create repository link in the description but what frustrates you most about today's machine learning tooling Maira to create a milk crate yeah so ml crate is like is my sort of machine learning library and what it basically is is that it's it's not designed to solve any specific problem but when I get frustrated with someone else's library and like I rewrite a function or whatever I will put it in my library and then the the nice thing is that it sort of available I can easily install it and it's on cable kernels and stuff so I can sort of use it I guess what frustrates me most is probably lack of of good documentation so I think a good example of this is the fast day I library so it's a really great library like it's got in the sense that it has it's really powerful it's got a lot of cool features and I would love to figure out how to use it but so far I have really struggled I've apparently like the only way to understand it is Sir really is to go through the videos and watch them yep it's not really documented you just have to sort of know how it works and I don't like that um and like it's as a very prescribed way of doing things so it's like oh you've got to use you've got to use like a beta bun and all these things that's like no I just want you to do training I want to handle the data myself and fast there's like no you must use data bunch so it's it's this frustrating thing that makes me go okay fine you know what I'm gonna rewrite my blank the code myself and that's partly like a stubbornness of like I don't want to like conform my code to how this like one live every wants me to write it like I want to write it this way so I guess it's I'm trying to think of like other examples but it's I find like the lack of documentation quite frustrating in in some respects and also I guess like scikit-learn like has this like pipelines thing right yeah something like that like and I don't like that and that's a personal preference thing but I guess it just highlights I don't like like being prescribed like this is how it's gonna work by a specific package I prefer to like use a package okay I give it this I get out this data okay now what am I gonna use next like how I can use one of my packages I didn't like basically being forced to you someone else's pipeline so my final question in the area of data science to you would be what base advice do you have for someone who's looking to snatch your title becoming the youngest future car Graham Foster oh I guess I mean people often ask me how do you get started and like I said the competitions nowadays are very different to how they were three or four years ago I guess I often recommend people go and do previous older competitions so nowadays they're really not that nice to get started on so I will say I often recommend like competitions that I did when I was getting started like a download this tabular data and try this out look at the kernels and so on it's not the same obviously to do to do an old competition but it's it's a nice way to sort of practice but I guess the what I would say is that you you just have to sort of keep doing it and hopefully you enjoy it so if it if it hopefully it's you find something it's fun and and and that sort of creates a loop of you you end up learning it without actually like trying really hard to force yourself to learn it but I guess if someone's asking that question and that sort of motivated to to learn and caracal and machine learning all these things and then then I don't have a problem and I think that applies to most things not just yeah and you definitely don't need a master definitely do newbies the unibody yeah you don't need you might I don't never ever complete the science disciplines but like as I said right cargo is is so applied and getting your hands dirty and you don't need to know the equations you need to understand the equations and just need to have an intuition for it and I think that lends itself really well to people without formal degrees and so on without having been through this rigid structured approach and in universities and colleges and so on yeah so this will be a tricky question I really enjoyed asking this with zebra but what would be your favorite computer game of all time I know your game in fairness yeah my my all-time favorite and I honorable mention to Minecraft obviously because I suppose I've spent thousands and thousands of hours in Minecraft and items you know how could yeah I started playing it like a good ten years ago and I played it more than anything else for sure I think like I think my the game I've enjoyed most and is my favorite game and if a rule is game called Nier automata I don't know if you've heard of it but it's this quite quite well-known Japanese open-world RPG and I just I I don't know why I'm doing it so much but I really loved the story and I keep every now and again I'll think like oh I really want another game I want to play I want to play that again so I guess that would have to be my all-time favorite game okay now my final question can you reveal your secret behind the name Anu Kazan cagin I think it's a use anymore yeah it is I I I'm gonna say no I can't the truth like it's a really boring origin story so I I I keep it shrouded in mystery and everyone asks me this and I always refuse to answer say we'll try to get you on the show again to reveal but thank you so much Miguel for joining me on the podcast thanks thank you so much for having me over and yeah I hope I see you soon again [Music] thank you so much for listening to this episode if you enjoyed the show please be sure to give it a review or feel free to shoot me a message you can find all of the social media links in the description if you like the show please subscribe and tune in each week to chai time data science [Music]
Info
Channel: Chai Time Data Science
Views: 2,553
Rating: 5 out of 5
Keywords:
Id: maR9ibJ2r7g
Channel Id: undefined
Length: 59min 19sec (3559 seconds)
Published: Sun Feb 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.