Yandex 5 secrets to becoming
a Kaggle grandmaster Pavel Pleskov My name is Pavel Pleskov.
I’m a Kaggle grandmaster. I want to share a few secrets
to becoming a grandmaster. But just by way of
managing your expectations: This is by no means a one-stop
guide to earning the title. You probably won’t become a
grandmaster tonight, or in a year. Chances are that after this lecture,
many of you won’t even want to be one. Well, so much for
the expectations. However, the secrets I want to share
are applicable outside Kaggle contests. They go
far beyond. I’ll talk about myself for a minute,
because it’s relevant … … to becoming
a grandmaster. I studied at a mathematical school,
MSU Faculty of Mechanics and Mathematics and the New Economic School (NES).
We didn’t have much data analysis or data
science at MSU, mostly probability theory … and statistics, which I learned
by teaching it to freshmen. That’s when I learned all
about biases and histograms. At NES, we mostly did econometrics,
which was distressing — fighting heteroscedasticity, figuring out
the meaning of regression coefficients. I never encountered any
of this in data science. After graduating from NES,
like everyone, I was solving the
problem of income maximization, between consulting and
investment banking, I chose consulting, since it meant
not working on some of the weekends. In consulting, I was engaged
in data cleansing. That’s what all data scientists
spend 90 percent of their time on. You get a bunch of spreadsheets
and merge them into a big table. Rarely did we use any advanced methods,
like regressions, also in Excel. At one time, my manager used R to check
the results of a logistic regression that I did in Excel.
He spotted a mistake.
Since then, I don’t like R. That’s my experience in consulting.
Then I got into the … … exciting world of Russian
high-frequency trading. I had to learn, or rather, relearn to
program in C++, which was taught at MSU. I also learned a
great deal about … … the job of a
researcher in general. In trading, people talk about some
holy grail idea that would make you rich. It doesn’t exist.
You merely
edge closer, … taking a hundred incremental steps
toward a good trading algorithm, or a predictive model.
And you will fail a hundred
thousand times on the way. And that’s what being a researcher is:
testing hypotheses, most of which fail. Also …
In trading, I found out about things
like holdout sets, overfitting, etc. A basic intuitive understanding
of this helped me on Kaggle. These are the skills and the
experience I had in the early 2017, as I left my company,
taking my share. I finally had the time to
do the things I wanted to. Actually …
By then, I’d been thinking about
data science for some time … and saw it as an
interesting field. That’s what I wanted to do.
So I went to Yandex trainings. Like you now, I was here
last summer, to hear the talks. For some of you,
it may be a first time. I couldn’t understand half of the words
they said: “cross-validation,” etc. I carefully wrote them down, went home
and educated myself by googling. And I … watched a whole
lot of video lectures. Since I had lots of spare time,
I was working out. And one miserable day
I broke my leg in a bad way, narrowing down the range
of things I could do. At home, it was just me and my laptop.
I figured I had to do something. So I started solving
problems on Project Euler. It’s a platform hosting math and
programming problems, about 600 of them. It’s somewhat similar to Kaggle.
They have ratings but no prize money. That’s what I did daily. I woke up and
asked myself, “What do you want to do?” “Solve problems
on Project Euler.” In fact, I’d been on that
platform for a long time. That’s how I got to know
C Sharp, C++ and Python better. But then I ran out of problems. Well,
I ran out of the problems I could solve. Finally, I made up
my mind to try Kaggle. At that time, I made the mistake many
people make. I mean … putting off Kaggle. I was thinking that only after listening
to every online course out there, will I deserve
to be on Kaggle. My first small piece of advice is:
Don’t put off Kaggle contests, start now. When you come home, sign up if
you don’t have an account and submit. And you will find that practical
experience on Kaggle is far more valuable … than any online
course assignments. My first contest was about images, so
I bought a GPU off a classified ads site. It was the summer of the cryptocurrency
hype, GPUs got expensive. People wondered whether to mine
or not, and if Ethereum will grow. I was thinking about that too. But I
thought I’d buy a GPU for Kaggle anyway. I took bronze in my
first contest, placing 98th. All I did was launch a public kernel
and use slightly harder augmentations. And I got
the result. I thought, if getting a medal was so easy,
what could I do if I tried really hard. Ten months of nonstop work and
I was at No. 16 in the general ranking. I turned grandmaster
after seven months. I can tell you it’s a long way,
and it’s very labor-intensive. Moving on to
the five secrets. People asked me why I quit my business,
which was profitable, to compete on Kaggle. My answer’s
been very simple: I wanted to do only
what I enjoyed doing. It was …
It was the same with Project Euler.
The prize money was irrelevant. I just woke up in the morning and asked
myself, “Do you wanna do this?” And I did. Same thing
with Kaggle. And that is quite a
big secret. I mean … Many people know about it, but they
don’t use it. You gotta do what you like. As university students, we all looked
for work trying to maximize the income. We took jobs we
didn’t like, earned little. In time,
we started making more, and yet more,
but still without enjoying what we do. And this stage is where
most people get stuck. We tend not to transition into doing
what we like. But I did. With no regrets. I encourage you
to do it, too. These two hardly need an introduction,
but anyway: Dr. V. Iglovikov and A. Kuzin. They are the most famous
members of the ODS.AI community. Vladimir sort of stands for the
academia and Artur for business. But this is quite arbitrary,
mostly in my head. They are both data scientists
in the industry now. It’s probably just that
Vladimir has a Ph.D. But to do justice to Artur:
His h-index is actually higher, I think. Anyway, what makes
academia and business different? To get an academic publication,
you need to prove you’re innovating. I mean …
Academic success
depends on innovation. You can’t just say you’ve doubled the data
set and improved the model, and publish. And that is how many students
approach Kaggle contests. They start tinkering with
fancy losses, fancy architectures, which takes them a month, but then
the time is up and it doesn’t work. No win. No score.
No medals. Instead, I approached all contests with
a business mindset. Let me explain. It means that
… you have a project, and there’s
a certain probability it will succeed. All you can do is maximize the probability
of success by leveraging certain resources. Now, what do I mean by
resources in a Kaggle contest? One example is finding a new data set.
There was an image contest, and the organizers forgot to list
the rule about external data sharing. So all the teams realized that
if you have more data, you win. Our team had a guy from Yandex
specifically for finding data. In another contest, it turned out that by
merging teams and blending solutions, the final score
grew by a lot. So new teammates were the key
resource, which brought us gold. Sometimes you have no access to a powerful
machine with a bunch of GPU and CPU cores. You find someone who
has it to join your team. Perhaps others won’t accept them saying
their score on the leaderboard is too low. These are the resources I used to win
Kaggle contests and earn gold medals. Talking
about business, I cannot but mention
fast.ai founder Jeremy Howard. He is
amazing. This slide is a bit crammed, but
I wanted to leave nothing out. He founded three
successful startups. At the age of 19, he made $200,000 at
McKinsey. He was hailed as a wunderkind. That’s what Silicon Valley
programmers make now. At AT Kearney, he was the youngest
engagement manager globally. He pioneered
big data. He also learned
Chinese in a year. So what does he have
to do with Kaggle? Following the launch of Kaggle in 2010,
he was No. 1 for two years. In 2013, he served as the president
of Kaggle. He’s now developing fast.ai, a deep learning library.
I recommend that you use it.
That’s what I did myself. It has its drawbacks, which
people are quick to point out: poorly written code, hard to make sense
of things, not enough documentation … And yet it’s
very flexible. What makes
Jeremy exceptional, is that he produces state-of-the-art
results using his own library. And he never stops
until he achieves that. So …
Make sure you use it.
This is my secret No. 3. For me, it was
easy to delve … … into this library, there’s
a series of two-hour videos. They are exceedingly boring, but since my
leg was broken, I could do worse than that. That’s not the only way. You can get to
know this library by practicing to use it. And there are lots of useful
threads on the fast.ai forum. So it’s not
that hard. One further piece of advice for those
who still haven’t learned English: Please do it.
If we talk about
fast.ai, for example, there are enthusiasts who take
the trouble to subtitle the lectures. But if it weren’t for them,
people who don’t speak English, … would have no access to
the latest results, the best ones. And by the time they gain access in
a year or two, the world is different. Bottom line: If Jeremy learned Chinese
in a year, you probably can learn English. On to the
next tip. No presentation on data science
is complete unless it mentions … … Stas Semenov.
Like Jeremy, he is
one of my heroes. He also has a
trading background. He specializes
in data leakage, which is what …
Well, this is what quants
normally do in trading. You have a stream of stock market data
and you need to find the singular points, in order to come up
with a strategy. I also recommend the course “How to Win
a Data Science Competition” on Coursera. It was created by these
brilliant grandmasters. They said it was very hard,
which they didn’t expect. I encourage
you to sign up. Rumor has it, A. Guschin turned
grandmaster in five months. They say he locked
himself in his dorm room. That’s what I heard
from Yandex people. This is the price
you have to pay … … for the title.
My final tip is about teamwork,
networking and the ODS.AI community. If for whatever reason you
still haven’t joined ODS.AI, do it. It’s a Slack channel with 12,000
Russian-speaking data scientists. That’s where you get the
answers to all of your questions: What hardware to buy? What software
to use? Stuff about contests, etc. This slide shows the people
I took part in Kaggle contests with. They helped
me win medals. If it weren’t for them, nothing would
happen. I got to know them through ODS.AI. So don’t overlook this
amazing opportunity. Namely,
ODS.AI. Well then …
Suppose you went grandmaster. Suppose you
had three hot startups and plenty of time. You gave it 10 months of your life, maybe
you locked yourself in the dorm. What now? What’s in it for you?
Is it even worth it? After I turned grandmaster,
I started getting job offers. The positions varied widely, from
data scientist to head of data science. Ultimately …
I chose …
I chose to work in consulting,
as chief data scientist at Data Nerds. But that’s just one of
the opportunities you get … … when you gain a high
profile as a grandmaster. You can join a startup.
You can get employed. You can read lectures,
as many grandmasters do. A further, unexplored possibility
is competing on Asian platforms. After Google acquired
Kaggle two years ago, … the platform changed.
There’s less prize money. You know,
the grass used to be greener, etc. But …
Now, lots of Asian
platforms are emerging. The competition is low, the prize money’s
good. They will be gaining in popularity. So it’s the right time
to explore them. Luckily, Google Translate
works just fine. That’s it,
basically. This has been my way to the grandmaster
title. Whether you want to earn it or not, … is for you
to decide. I’m ready for
your questions. Thank you. Questions? Good afternoon,
and thanks for the talk. I’m wondering, what if one has only two
hours a day to spare, instead of eight. How do you
spend them? On Kaggle.
So I guessed. I mean, what’s the focus?
Would you prioritize testing new ideas? Or testing the
existing ideas? I think I would work
in a team and … … ask my teammates to test them.
(Thank you.) Hello, thanks for the talk. My name is
Oleg. You said the best place to start … … is on Kaggle, by taking part in
contests. However, I did try to do this, but at some point, when I make a
submission, my score no longer grows. And I realize I need more fundamental
knowledge, if I’m to go any further. How do you find the
right balance between … pausing to enhance your theoretical
background and generating new ideas? Thanks.
It’s a good
question. Indeed, a sound
balance is needed. Kaggle is unique because of how many
public kernels are available, very useful. This includes discussions. Just to read
this takes longer than it may seem. Still, I recommend looking
through every forum thread, everything you find on your
contest and maybe those before. If you see something
you don’t understand, then it’s time to eliminate the
gaps in your theoretical knowledge by reading forums
or taking online courses. I remember a tip
from Jeremy: Every day, make sure you submit
something that brings the score up. It’s difficult, no doubt. But still,
think about how you can nudge it up, … if only a little,
if only by 0.001. In fact, there are fairly standard
tricks for doing this: folds, stacking, … mixing solutions.
By learning these tricks, you get the most in the
way of practical training. And this involves
learning theory. I must say,
Jeremy’s tip has
one major drawback: Kaggle doesn’t let you team up with
someone if the two of you combined have too many
submissions. So there’s a
slight trade-off to it. Does that settle the
question? Thank you. You approached
these problems with … … a business mindset.
But my, perhaps unfortunate
experience has been that … the business approach
often doesn’t work. For example, in a recent
contest held by a bank, I tried to use the business approach,
since I knew that industry well. But I lost, because among the features
that worked was, say, client identifier. So, to what extent is the
business approach applicable? I know that contest.
It was held by Rosbank. For one thing, it was
not a Kaggle contest. But in any case, when I said “business
approach,” I meant something else: not knowing the industry and
using this to generate features, but rather how you
approach the problem. The goal is to be at the
top of the leaderboard. It’s about the way
you achieve this, whether you try to innovate, or you apply
standard, maybe even boring techniques — … but then you try lots of them.
That’s what I meant. I wanted
to ask this: Solutions on Kaggle tend to be very
elaborate: many layers, models, stacking … But after the transition
from Kaggle to consulting, how complex are the solutions you provide
to clients, and who’s to support them? Thanks for another
good question. Although Kaggle was initially created
for businesses to solve their problems, there’s a certain simplification in that
companies bring their own data, metrics. When you’re in consulting,
this is not the case. You step back: collect data, cleanse it,
think of an optimization metric … … that would solve
your business problem. But actually, it’s merely one step away
from Kaggle. It’s not all that different. As to the solutions being elaborate,
there’s a problem with Kaggle: When the contest is over, the three
top teams get their prizes, and they get a call from the
company holding the contest, with the top managers who
thought of Kaggle in the first place and one poor data scientist tasked
with implementing the solution. And the No. 1 team starts explaining it.
They have a hundred models, stacking, etc. Eventually, the client either has to hire
the whole team or just borrows some ideas. In fact, Kaggle documentation
has a paragraph about … … building a simple model that
yields 95 percent performance. That’s what
businesses want. So yes, real-life
solutions are simpler. But some of the contest tricks
are still applicable and effective.