My Journey: How I Became The World's First 4x (and 3x) Grand Master On Kaggle

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so welcome and this video is about my journey a journey that started long long ago and it's about how I became the first 3x Grandmaster on Kaggle and a lot of people have asked me this thing how I did it and what do they need to do to reach this level and a lot of people asked me when I became the first 3x Grandmaster and a lot more people asked me recently when I became the first 4x Grandmaster a goal so I decided to make a youtube video about it so this goes back to 10 years ago it all started in 2010 and I was studying electronics engineer electronics engineering BTech in India in NIT Surat and some of my batch mates at the time and they got some really cool internship there was a dot scholarship and they got to go to Germany some people went to Indian Institutes and it was really nice for them but I was feeling like what was I doing with my life and I had a lot of interest in image processing I was also working at the time I was working with head of department in the field of fingerprint recognition so I had interest in signal processing and image processing I had no interested in electronics engineering but that's a really different story so I started applying for research internships in many different universities in India and abroad and a plot to like I don't know like 60 or 70 different universities and institutes big small everything and either I was too late or I was rejected but finally I got an offer from University of Warwick and that's near to Coventry and in England and I was I was I was quite excited and um it was about working in the field of biomedical imaging so I went there for a couple of months and learned about decision trees working on biomechanical images trying to find the cancer cells and also about random forests and when I came back to India for a couple of months a friend of mine came to me and told me about Carol I think was early 2011 and Kangol was pretty new at that time I was either late 2010 or 2011 I don't remember correctly it was pretty new the website called cackles and he mentioned that since I know about random for us I should look take a look at the website called cackle and everyone was using random for us to win machine learning combinations well that that made sense and so I made a count I was very new with trying for us I didn't know any Python and I just knew some basics that's it so I made an hour on pause and then I forgot about it I was more interested in in FIFA with my friends than doing machine learning competitions at that time so fast forward 2013 I was in Germany in 2011 I left India and I came to Germany to do my master's in computer science I also I applied like two or three universities not more three universities I think and I got accepted in all of them and it was all because of my internship previously I got a good recommendation letter from my professor and in 2013 I was I was like one and a half years into my masters and I was working in front offer so while I was doing my masters in computer science or informatique as they like to call it I was a student employee in front of her and I was working with again image processing implementing OCR recognition algorithms on microcontrollers or testing them it was more about testing and implementation at the time and I recently started my thesis also which was about saliency detection there was in computer vision so detecting salient objects in images and it was very remotely related to machine learning in the sense that I was implementing the modification of K nearest neighbors along with everything else and that was like a very small part of it so that was the only remote connection to machine learning from my thesis side and my friends who were working in front of her walk with NLP and neural networks I had no clue what NLP was like hey I didn't even know the full form of NLP so after a few months after listening to them it raised kind of my interest in machine learning and natural language processing in neural networks I had taken some deep sorry data mining courses from the University which I've failed miserably so the courses only focus on theoretical aspects no details about application no coding I was kind of boring for me I decided not to do the course from the University again but rather learn it on my own and then I remembered about Carol I tried to create an account it said my email already exists so I already had an account so to my surprise after logging in I saw these different computations going on there was a competition going on about recognizing facial expressions wealth was my favorite I was about image processing and all I had to do was download the images write some code in MATLAB to detect expressions in those images expressions like anger disgust fear these total of different seven kinds of expressions so I fired up MATLAB wrote some code spent a few nights trying to understand the data and how to create submission files it was it was quite difficult at that time so I made some rules to decide the expression write the rules like detect the eyes to take knows what is the angle between nose and eyes these kind of different rules and how it relates to anger there was no algorithm when walrus rules and it was a big failure it took a while to make the first submission worked quite hard on improving the scores I team up with another friend who was also working at front offer there were not so many people doing the computation at that time maybe because they needed too much computational power at that time anyways I had failed my tricks did not work if I remember correctly I also learned about CN n in the completion so when the competition ended and also during the course of computation people shared a lot of different discussion topics and they were about these abbreviations CN n RB m SVM so I was confused what to do but after the end of the computation I started reading about them I spent a lot of days working in after offered work I would spend the whole night in front of her just be there rather than going to my dormitory and to learn about machine learning and most of the time was been reading up some tutorials online and then trying to implement them on my own and so that most people use Python for machine learning problems so I when i was i was in my bachelor's I started learning C C++ so I revised some concepts of C++ and trying to know what Python is all about and trying to start coding in Python on my own so I would implement small functions in Python and try to run them and after three months three months went by by the machine learning concepts and what watched a lot of videos and tutorials and whatnot tried to bloom in the papers I took part in a competition after three months which was about detecting if recordings of you add audio files and if the recording contained right whale call or noise so it was a binary classification problem we appreciate quite a bit in this combination and with my own Google search I learned about M FCC features so I extracted these msec features from audio files and try to build a model which I had run from before I also extracted spectrogram images and try to do some kind of pca on them and extract some features and will the SVM a model I also tried to use random forests so when the computation ended I had my first bronze medal I was really super happy well at that time you didn't have medals it was like I was in top 28 percent of the competition and I was better than top 50 percent of 70 percent drop ninety percent of my previous competition - I was so I was really super happy and people shared a lot of cool ideas how they approached the problem arose it was super interesting to look at the solutions of other people that try to implement them on my own so there was a spectrogram images and there was a right whale call how to detect this call you know it was really nice if you go back to the completion you will still learn a lot so few more months passed by and I kept working as a student in front of her doing the master's thesis and that night I was Batman I was learning about machine learning and I was I was getting better with coding in Python and I would spend a lot of sleepless nights implementing papers used a lot of Stack Overflow that told me a lot and if something if I got stuck in something I would try it for several hours if I couldn't do it I was also scared to ask questions but then I said King questions on Stack Overflow even if they were small stupid basic questions if I didn't find an answer to our existing question similar to that I would ask question so after a few months came the Amazon competition and I had learned quite a bit by then I knew about different algorithms like logistic regression SVM decision trees random forests gradient boosting in this competition people shared a lot and people should code in the beginning how to create a benchmark and how to score high I learned what one hot encoding is after a lot of feature engineering lot of hyper parameter optimization spending a lot of time on this competition I caught my first top 1% top one person at that time was amazing it was a silver metal so starting from a AUC of 0.55 I reached an AOC of 0.9 won that place mean top 20 out of 1600 2000 people I don't remember exactly but or something like that and it was I was super happy yeah I learned a lot more about feature engineering and handling tabular data handling categorical variables how everything the models help these kind of tricks and during the same time as my Master's was coming to an end I started applying for data science rules so I put some of I didn't have much on my resume I had this job at front offer and image processing fingerprint recognition nobody cares about it and so I put some projects from cackle like this Amazon competition there were few more and I also wrote how I approached these problems and that helped and I got many resided rejections a lot of rejections almost all of them were rejections nobody wants to hire a data scientist right out of three university I was really difficult it's the same now and towards the end of the year so I kept applying and towards the end of the year there was a another challenge on Cagle called cause-effect challenge so I also teamed up with someone I didn't know and that challenge this there was a person from China and we collaborated we talked a little bit on I think was Skype or Google Chat at that time an email to exchange lot of emails shared the code and it was also about feature engineering a lot of different building different kind of features you have two vectors and you build features too like distance based feature these kind of features so it was a lot of features hundreds of features and how to improve further so what helped in the end was obviously feature engineering but also on something of different models so there I learnt about on something of different kinds of models and there was a competition where I got my first top 10 rank and that means like these days it means a gold medal so amazing feeling and this happened I think I think after seven months since I started my very first competition and wood Amazon and cause-effect there was a lot of learning involved the winners and others they people shared their approaches I implemented parts of their approaches even full approaches and their solutions and there was a lot to learn from everyone and my job search was also going fine I landed a job in early 2014 so it was time to wrap up my thesis my contract would front offer had also ended and so I had only two things to do T's in goggle were a couple of months so I did that and at that time came NLP competition on goggle StumbleUpon so that was the first competition in which I started sharing code and I became min in famous for my beating the benchmark codes so this was before there were kernels so people would share the code in discussions or as attachments in discussion forums so the code that I shared received a lot of appreciation and also a lot of criticism it was simple benchmark code using tf-idf that's called very high so it was simple techniques I optimize some hopper parameters I put the code online for everyone and after working day and night and this competition I managed to get a sixth rank a gold medal and bonus was I got a solo gold medal so the winner of that competition he was Francois the creator of Kiera's and it was it was an amazing combination I learned so much about cleaning HTML text cleaning text data building models or text data it was really amazing combination and after a few more competitions and won more gold medal I think I was I think at that time I was in top-10 enough gaggle and I had moved to Berlin quit my job after six months and then moved back to the University so for doing my PhD and my supervisor my PhD supervisor it was a big fan of cackled and that's how that's how I caught my PhD and I also had a lot of time to learn new things on my own so I wanted to spend some time doing some research working in recommendation system working in auto ml and I would keep doing the same so spent like the whole day in the lab and then night either at lab at or home doing competitions so I got few more silver and gold medals during this time after a year into my PhD I quit and move back to industry and things change but one thing never changed I never stop learning and carolling I didn't have much time in the industry so I would focus only on one problem I did a lot of computations simultaneously with my focuses only on one or two computations and other computations if I if I got a good rank it came as a bonus if they if they ever did so I'm not sure what happened maybe in 2015 so I don't know I don't remember when it happened I think it was in 2015 I think I was ranked third in competitions worldwide one kind that became my highest rank and computations so after that I became a little bit of slacker and stopped doing Kegel that much I switch many jobs during this period and I didn't have cattle cattle time so when new categories of discussions and Colonels were announced I had much less time so at that time they changed the whole like how points are calculated how medals are awarded everything I was not able to spend time on combinations and thus I had no discussions and very little kernels sometimes I would share some code and working on startup spent less time for kaggle but it was a lot of learning like using machine learning in the industries it's very much different than working on a kegel combinations whether you have keen dataset so fast forward to 2019 I had moved to Norway left Germany after six years and I started working on competitions again and started sharing a lot in discussion forums and creating some kernels some useful kernels for people so six months into 2019 I got grandmaster in all three different categories so competitions I was already a grandmaster and then discussions and kernels so I still had very less time and I had many medals in kernels and discussions before I resumed so four months after I became 3x Grandmaster Keuchel decided okay screw him let's start another category called eight assets so I took it as a challenge and I had accidentally deleted a couple of datasets with few months ago and I'd regretted that so getting would show later that this is very difficult because it's hidden and you upload the data said nobody knows about it people have to know about datasets that you upload so when you create a data set when I created datasets I also create some kernels to go with the datasets and I looked at competitions so there were computer vision combinations going on image competitions so what do you need an image computations is pre-training model so I created a set with all the Praetorian models and more than 40 pre training models so I shared kernels with those datasets and I also shared about datasets on LinkedIn and Twitter I created tasks for datasets so people could do some tasks so that helped in one of the data set and five more months I was on vacation and I got a message from a friend of mine and saying that congratulating me that I had become fourth Forex grandmaster and it was it came as a surprise mmm so I was waiting for my metals to convert to gold so at the time of recording this video there is only one 3 X grand master and one for exeter and master and that's me and I'm really happy about it and that's that's basically my story you know I never gave up I kept learning failed a lot of times I still keep failing but I learned from others so people who share so if you're starting with machine learning or Kaggle remember that you have to remember that you have to work hard and find time to solve these problems so if you go to if you want to go to industry if you're a fresher you need to learn the tricks and make your hands dirty so you have to play around with data you not just keep on playing around with toy datasets like Titanic iris these are these are going to build you a foundation you can learn from these but you cannot you cannot just say in interviews about Titanic a country unless you have done something really extraordinary with those kind of datasets so you have to solve real world problems and I think cattle or there are some other competition websites which can also give you exposure to these real world data sets so one thing that really paid off for me in the end was perseverance if you persist one day you're going to win and there's also a lot of luck involved when you're doing kind of competition there's a lot of luck but you have to you when you're sure about what kind of cross-validation use what kind of models you use feature engineering and all that stuff everything that goes around you will get a good rank so about this YouTube channel I started with for beginners and with some knowledge in machine learning and basic knowledge how to apply to problems and it becomes very difficult for people so if I hope I'm being useful to the community to you guys and if you have suggestions for me to improve or if you have suggestions for topics that I should create videos on let me know and I'm gonna make it happen and please feel free to ask questions a lot of people who do kaggle competitions they don't ask questions they fail one time a couple of times and then after after that they give up it's all because they don't know ask questions they have questions they never get to hear the answers and don't don't be scared of people who were on the top of the leaderboard so one day you were at the bottom of the leaderboard runo's the next day you're going to be on the top and always you remember that no question is stupid and never ever give up so keep learning keep enjoying cackles and my videos and thank you very much and thanks for listening to me if you have any tips for me I'm happy to listen and if you have any questions for me just send them over and I will I try it you answer all the questions I get so I try my best so thank you and see you in next videos bye
Info
Channel: Abhishek Thakur
Views: 52,530
Rating: 4.9669237 out of 5
Keywords: machine learning, deep learning, artificial intelligence, kaggle, abhishek thakur, how to become kaggle grandmaster, how to kaggle
Id: z15TKkAPNUM
Channel Id: undefined
Length: 24min 50sec (1490 seconds)
Published: Sat Feb 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.