How He Became a Kaggle Grandmaster and Got a Computer Vision Engineer Job at Lyft

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
everyone welcome to that wars data science podcast I'm your host YK tsuki and today I have an interview with Vladimir a globe cup he is a cargo grandmaster and he's currently a senior computer vision engineer at left in this interview we talked about his journey as a data scientist through his first design job various cargo competitions that he did and eventually becoming a computer vision engineer at whiffed and first I asked him to introduce himself so my name is Varner 10 years ago making the United States - the greatest school to use Davis to study physics then I got my PhD there and at the end I country like a question Russian I will stay in academia to industry UC Davis located one hour away from Silicon Valley and plaintiff my friends got their data science and support engineering jobs here and I decided that in terms of new knowledge in terms of money in terms of doing something new in my life so you can when it looks like a good place to live so I got my first job in start-up called bajillion it was in Sunnyvale MI Dimplex anywhere else Sunnyvale is empty and it's not that the Silicon Valley that I mentioned working on the streets and Kenneth mcclain so up to eight months there I switched jobs and I moved to San Francisco which is definitely probably the best place to be in Silicon Valley deep has like some limitations maybe a bit dirty here and there but it's what you expect when you come here you know life is boring big tech companies meetups conferences startups in every Starbucks you see like people on their laptops trying to build their own company or something is happening it's like Google Drive and I enjoy it describe your suit moving to San Francisco but my second company was cultural code it was a debt collection agency and I had cool jokes this demo strong and product was decent but at the same time I didn't want to stay there that long after physics background you know collecting debts using machinery to look as exciting so often after truck or enough lift to work on deep learning and you turn related jobs and I explained its blog post you know my journey for this position it was slightly challenging but still kind of a quirk so and for the last two and a half years I work here nicely so before what you explained in your article how did you switch from physics to your machine learning set up jobs so at the time when I was doing this life like probably significantly easier I mean the hiring some sense so machine learning is the relatively new field in the industrial setting of course people were using logistic linear regression for the community kids but in such scales we have a trade now where there are special positions like research scientist function learning engineers data scientists it's relatively new story so no one knows how to hire and this adds a lot of lauter grid so it was relatively challenging for me for moving from physics to data science you want me to review you asked about like MapReduce and Hadoop and spark and second interviews you asked about you know how to derive SVM third interview you asked about some intriguing ancestors of the banking business and all those knowledge when I came from academia really for inter means that's why it was a lot of chance when a neutron Academy to industry still kind of worked especially because picking up machine learning to the level where you can operate with this and solve some problems and bring value to the companies plus understanding math of it after physics the graph was extremely easy so basically moving from machine learning the physics maybe she'll be moving from physics to machine learning relatively easy and that's what I mean of my friends and overall like I can I see this trend keep on physics in academia they compared their salaries and projects that they have in the future as opposed those with what Google Facebook lyft uber can provide to them and yeah folks pretty big I see so I guess you you basically learn everything in machine learning on your own minute definitely yes so I take two couple classes of Coursera but it doesn't lead anywhere even now I mean it's skeptical about some classes in universities about machine learning the disciplines apply for every rule there is exception for every exception there is an odd exception things that work on one dataset wane of work on another and you need to develop some deep intuition to develop practical experience so my experience in machine learning team of course I was reading a lot of papers and books and blog posts the same time I underestimated rectangle competitions where I tried to bridge the gap between datasets that kind of provided at the metrics that we were they defined there there's this theoretical knowledge that are gaining from papers and that was mortally painful because units on paper take like what is claim stated without result you train some normally apply this and here the bottom of the little boat at goggle because it's not state-of-the-art and because community smart or maybe data set is different I think like this so I was participating on more than 70 competitions finally after few years ago title cago Grandmaster which again as a title opens some doors for me in terms of career but unity of course on public speaking conferences and this definitely excitement accuracy so yeah basically my machine learning knowledge it was yeah people certification and it's hard for me to imagine any other flow so study machine learning in in academia you can do it to some extent but still you need to train something 24/7 to get you to fit and if you're thinking about machine learning is applications to industry deployment some real time requirements may be embedded devices on may be stopped traffic cars Indian pretty hard for me to imagine that you will do this in your free time you need to do it at work to understand what sexual limitations here they're basically machine learning is not physics styling it's significantly easy and you can do it by yourself even if you're still in the high school and you don't know all this like Christmas right so if someone was getting started with machine learning today how should they get started with that so for this question I typically also you just go to Kangol and try to participate in these competitions any competition that is the soonest you jump into it could be someone's code try to understand it make a submission read about deeds I mean some heart issues efficiently maybe a code code it is maybe something is not cool but you need to try to climb on sleeker board higher higher and higher and your knowledge will come slowly you will like to see that pipelines that you're building for different competitions they remind itself you try to make them more general and this may be of your software engineering skill also you really understand like so you see you're competing in Kerala against people that know how to do machine learning and practice and they understand theory and they understand what and how all this parameters work so for you to get to the top you also need to get to the stage so in the beginning you'll start blindly without thinking without knowing what exactly are you but then you will pick up you read you ask you interact you have this like a lot of these aha moments you can videos where people cover solutions previous competitions are very available do read some again like a metric papers but also it's important maybe like watching videos from the papers if you can email about the conference you talk to people ask questions basically I mean if you have this like goal to optimize this metric in terms of get client I didn't put no which will come there is no way to earn it and I say so I believe if like you have no clue what's happening or it doesn't matter the ground marshal having competitions works for me relatively well it worked really well for a few of my friends and I believe it's not the general way to get into this area right and how long did it take you to go from their experience in Kabul to being a Grand Master this is good questions so when I started I'd live like five years ago my first three competitions with epic fails I did everything right everything following textbooks and it didn't give me good results but after every competition I read solutions of the winners to learn the tricks like understood what am I missing how to work it did a better what tools they were using that I was missing and after third competition like what was called a silver medal basically it's like if you get in top 50 out of 100 thousand you get silver medal so I got in top 50 and after this it took me another year and a half to get a gold medal gold medal something that you get in the top places and when you get this gold medal we get like title of carbon master 3000 carbon masters in the world right now so it's not the trailer but actually under common they become kind of real mastery need to have five gold medals and one of them should be solid wrist I mean no team by yourself and it took me probably like another year or two dreadmaster it's about two hundred right now in the world so it's relatively challenging dusk any but at the same time as I mentioned it opens a lot of doors to you because if people want to invite some software engineer for it it's one story if they want a ride so like for the conference to give a talk if like it's software engineer who struggled in master you get extra credibility and you know how a keynote speaker at some small conference next week and I'm also giving a talk in the month so kind of house nice so you take about like two years total and maybe three years people they were able to become great masters in one year but it's a tight timeline because I've been posting from scratch from nothing to this like the top knowledge in this competitive machine learning world it's challenging so any game we have three million people registered at Cargill and only 200 of them are great masters so not everyone is probably capable of this doesn't have enough time motivation basically I wouldn't say that if someone will invest like two year two or three years of his time a commercial running competitions he will become your she will become grandmaster it may be a strong statement as many people try but not anyone was able to do right and you see you worked on more than seventy competitions right oh you think so I mean work doesn't mean that I worked really hard maybe like between two thirty of them I really invested time and try to fight for the queening and the places but others I'm a worked at the dated refused admission speaker in a few times and then I will lose interest to something happen there and it's not only paid Calcagno maybe 60 but I also at some point of my life I realized that Cargill competitions are hard you can get like top portsmen if you want money involved recruit them with your bow thousand participants and this is hard at the same time people in industry they don't know what is going on how challenging it is and they do care that much right now situation is slightly different but two years ago it was the case people did another calculate this is like this at the same time many people in like in a mystery Denis about CDP are Macari and new reaps some other conferences and all this conference they also host competitions in typically a level of these competitions is drastically easier it takes I I had this experience being a journalist in the blog post you come to some conference you do some baselines things that are considered baselines and goggle they may be considered state-of-the-art at these conferences and I have this experience at CPR and makai you get to the top and then you add this to your resume and instantly hrs are very excited because they see subcommittee would that they were instructed to sit so at some point I switched from cable to academia competitions so I need a medical imaging and will accept light imagery CDR makai and then to wrap it up with a published papers based on solutions that I my teammates had you published papers on solutions yeah apparently it was surprised for me but every solution that you have from this competition you can wrap as a tech report or as an academic paper and it if it's worth reading if problem is clear solution is clear and basically more more report about work that you need things that you used in what worked what doesn't need is innately accepted to the different conferences maybe not new ribs but still I have this experience with CPR workshops the kind of workshops ecml I think it was SML ICC me yes so in this sense this competitions helped me on one side boost my skills it was my personal brand I got like some money it was late lakh total present mine was from the less than fifty thousand but still at something and of course also because they published these papers did help my community to understand what exactly my team and I were doing his problems and for me again to boost also my Google Scholar and maybe if at some point I decided to go vector cadena it may be beneficial right do you think that was helpful for getting your job a lift - uh not really I realized that so I had this believe that machito from the competitions you kind of publishes a paper and I didn't do this I realized that it's doable only when a giant lived and that's why if you look at my Google Scholar most of my publications citations that come after the last two and a half years so I needn't really check how in benefit right now at the same time if I will never look for new position right now and I have leaved self-driving these this is strong reliant and Carol and competitions and it means that I hope that if when I look for my next positions in my resume will be thick enough to open doors right so in this article you know like we were talking about you explain how you went from your debt collection job to you know your Java lifts so could you maybe explain what that journey was like in case people haven't read the article so I was working there collection agency I was doing traditional machine learning I was building recommender systems I didn't have extended knowledge in computer vision and deep learning I didn't have any lines in my resume that I did worked on a nice like relevant projects in academia and I didn't have any lines and resume at that I had something similar in at my current position but still so I wanted to move to this industry and when I believe everyone it's in similar positions you just like write your resume you send it around you go with the job interviews and you fail your are you repeat that you learn problems that you are facing and you not comfortable for example I was an interview with NVDA and I wasn't comfortable get to deep object detectors so I participated in competition for triopia detectors I spent two months on this it was a prize money and you know during this time I became extremely comfortable in the topic that was the if they're at a tire and its limitations for production limitations in industry and your names of all past papers at a time and basically that's how I close the gap in that knowledge then it's somehow the interview are may have some problems you know software engineering practices and you know after this interview pointed me out for disliked mistakes and errors that I made I focused on this area and then I repeated this number of times and at some point about eight once leaders who took me about eight months to get from this original point to get him over from left I studied the water I participated in a lot of machine learning deep learning computer vision repetitions yeah feel that I'm not like probably I applied to more than hundred companies out of the big companies I felt it revealed it Facebook Dean Tesla and what other be compressed have an played your father like in a middle and small startups and finally got to it now I'm curious how do you think you were able to you know motivate yourself so much during this time Oh in this sense motivation the time was significantly easier let's see right now if I would like to motivate my stuff to work with competitions to be challenging I have interesting projects at work they have like other one activity students are in a good position at the same time you see I'm Rotem debt collection agency and I didn't like my job it was okay but I wanted to move to some other place and you know if you have some burning urge to get outside of the place that you are really motivated to spend your weekends and evenings on some extra activities they may shock it assure you go make sense so you spend a lot of time on these competitions then the competitions is a second unpaid full-time job so realistically during the day I was working and trying to bring value to the company and then in the evenings and weekend sales spending full-time on my dis competitions and to be my future you know my hardware was optimized and ungodlike practices for working with the date fastest SD has to be used yeah basically competition is possible in terms of duration speed in terms of ideas that I was shaking and I remember in public transportation like all people are listening music I was being papers even this time I tried to leverage for very efficiency right and what was your hardware and the first setup at that time so I started with six three out of like twelve thread CPU 32gb rom and I had one Titan bit 12 GB ROM lamb somewhere in spring of 2017 and my friend and I we finished third in Cargill pistol competition and we got twenty thousand dollars in prizes so from that money ten thousand will mine after he beat Texas would say you get about six thousand Swiss pantheistic thousand and I'm bored another computer that had four GPUs 1080 TI and 64 GB ROM I mean some extended hard like hard disks and it's still my computer tool this time that I am using for calculations for some side projects or competitions or whatever I do wrong which computer is it so what we do 14 et is for us anyway so like what I stay finished like third in this still competition I got prize money and so I took this prize money and I bought computer with four GPUs there are GPS right yeah I mean I'm just kind of curious about it because I might need to you know get like a laptop or computer with GPUs at some point - I don't know that like you know the GPUs may be a good idea for some use cases but I can't really find because you can't dream anything on this like left of the GPUs you still need like access to the cloud with an external server and desktop right yeah thank God but for your computations I would say let's say at work ahem desktop it's of cause I have left up at work like because meetings and other stuff but I also have nested two GPUs and it's good enough for me prototyping for dicks and fast iterations and again wearing something here it just goes and calculated in the cloud right now many people like full similar approach I see that's is that what you did for conditions - I know who can petition the globe is expensive and if you're training for 20 was like if you like saying something occasionally cloud is a good idea here but if you're training something 24/7 having your own computer is much better that's why I bought this I have this one computer with one GPU and then I'm gonna another computer with four GPUs for prototyping and second for heavy lifting like standing and you know completing something for days and weeks right make sense and you switch from carrots to PI torture this is true yes so I was using cars for a while so originally I tried deep learning with Casa worked in some sense but it was relatively painful then I switched to Fiona fair knows better than Katherine but still kind of paid for learn from so are released carers that was rapper high-level rapper with Rihanna and Karis was really useful was convenient and I was using it for a while the problem that the Charis was that at the time it didn't work well with multiple GPUs utilization was much lower than for pilotage and also data loader is not that good plenty other limitations debugging of the PI torch significantly easier I've seen to be some to me that opening I've only switched up my torch and the Nvidia research on so they only using pi torch in their research they're definitely reasons for fast prototyping official GP utilization and many advantages that moral patrasche provides that's why I straight from terrorist to pile torch for one GPU it's maybe not that important but for multiple GPUs set up by torches at the time right now I hope Karis picked up that either type of significantly better I think what he is work it would really depends I don't think that I can talk about this in public but he was different from works depending on the task I say is there anything from work that you can talk oh I prefer not to talk about work you see we have pretty strict police and together for every conference that I give a talk on the behalf of lyft I need approval from marketing legal and management relatively challenging tasks so has a both standards available online let's see it is full lift was hosting competition at Cargill and there was its host I can like maybe like talk a bit more about this because we gave presentation at new ribs and right now we're preparing blog post that maybe share even more results code from our sides and what lesson we learned share I think that we cry so yeah competitions can provide a lot of value the company's evil they did if you like limitations if you want so first of all I know about zero examples when winning so in front from the competition went to production typically all competitions companies provide to get new ideas to understand what are the limitations of the existing data set how max like how much can you get out of it again like new ideas new approaches mm-maybe analysis you water the current research because participants take some recent papers they apply and they like report what works what doesn't this is also extremely available solutions that you get at the result they don't directly go to production but if there is a person in the team and accomplice who was participant in this competition may be experienced in some other competitions he or she may be able to extract like all necessary value from the provided code that winners give you and adapt this to the production limitations in this sense we were lucky because I was this person so I was host of the competition and I was interviewing the winners and I was like helping the community during the competition and so we go thus winning solutions I analyze them and the process in terms of like extracting value from them and making them part of our production pipelines so yeah we need this but one of the reasons like one of the main reasons why we probably did this competition is not because you wanted to extract some value for production pipelines we can do it ourselves but story is slightly different so if I when it come to the conference's in Europe so cbpr or something else there plenty of papers that say we like subscribing autonomous and things like this and they're like means and of course people using like for high values because I mean if you have some cool name chances that you'll get accepted or higher oh maybe if you would show some relevance to some hot topic which so driving is at the same time when you read the content of this paper when you try to understand what problems they're trying to face to solve for 99 percent of them maybe I did 90 percent of them problems that they're trying to solve up orthogonal to what industry is trying to solve like real earth so people like right what is publishable and not what was really useful one of the reasons for this is that data says that to be used self-driving cars again as a we I mean how the whole industry who believed way my active Yandex are the companies they are combined from like lidar and radar and cameras and they happening 3d and these types of the dataset and not that well-established not that well not that many of them the most prominent and most well-known dataset is Kitty and till this moment in last year all research was based on it but in last year active linked way my and are they I believe they released their own data sets that combined liar and came around at like some time evolution and this is cool so when you release dataset you write a blog post okay we release dataset blog post but what happens after this nothing had us people like instantly may forget because a lot of information in the internet to promote your dataset to make it a more established to attract research in indistinct astral community I mean people often organize competitions image I remember image net was game-changing competition some other competitions like coca and you know many others basically have data set to make competition top of this you may attract some critical mass of people that will build something on top of this published papers blog posts basically some water around this so in many ways this competition that we organized for the same part we release this data said and you wanted people get no used to it maybe share some codebase develop some pipelines and ideas under standards like you know not to easier maybe like horror basically this is the descent from being something like extreme and exotic this type of the problem being like more competitive and we organized this competition wheeler so what it was going for two months we got pretty interesting solutions winners share the descriptions that the cable for some of them share their code so I hope I mean I in this sense Lee Inlet hope then when we know someone else organize see what a computational they did the fall like people that are new to the air or even for experienced people it will be significantly easier to jump in because they'll be able to get some knowledge and maybe some code from something that participants developed in ours so we believe this competition went pretty well we had more than 500 teams at about 600 participants most likely for the next competition will put more restrictions on the influence time and on the hardware to make it more closer to our production setting and it happened that it move a bit longer because two months for this type of the day and said I believe is not enough like four months would be like slightly better yeah well is that they're set like so it's like lidar point cloud imagine kind of like you know we have like our real lift cars they're driving around each car three lighters has six cameras pointing in all directions we collect this data as a function of time and so you get this point cloud came Reynolds like synchronized so that you can map camera lighter lighter to camera and land for this type of the task we have 3d object detections of participants were asked to find 3d bounding boxes around cars animals emergency vehicles pedestrians and other costs I think so you're a computer vision engineer right yeah that's correct okay so since you can't talk about like you'll work specifically could you maybe talk about how you got into computer vision in the first place traditions were the way to go so right now what I see sometimes people call this like right now bold engage of natural language processing but three years ago when I was getting into this area computer lesion started moving from academia to industry and computer vision applications classification object detection and somatic segmentation more and more company started using them to extract value from the data that they had in this sense and it means that cable picked the trend more and more this type of the competition's are happening they have not the platform's a lot of research was in this area in this sense I was just like following the trend and of course like ma tutorial blog posts different libraries what's why in the field of the deep learning which is pretty broad I jump to the computer vision because I mean it's exciting area by itself and also it was easy to pick up at that point of time nicely so you were you got interesting computer vision and then you started finding real ad competitions and you just started working on those yeah exactly right you make it sound like really simple to get into it I would see symbols is straightforward mm-hmm everyone can follow the same path but this is usual for some people it may work a bit better and they'll get like hooked for some people may not work well and they will try some reasons excuses some different priorities so that they will not work on this type of the tasks right but again I believe for those that are right now in some type of the job and they want to move to computer vision related position the best approach would be probably similar to what they did save the resumes because again if you get your job fast you don't need this repetitions but if it's you know if you fail if people don't look at your resume or you don't have enough knowledge or something else is happening you can probably just start participating and you can go for sure learning imputations gives you about ten 200x knowledge per unit of time with respect to academia industry is extremely efficient in the way stressful the decision right believing stressful in what ways so carol has some gamification mechanisms that you notice like type of master can master some kind of points they have this like real time linear board and so if you get hooked in this competition you're participating and then would say go to sleep you come in the morning and you're twenty places like you know below and in the morning you should like focus ID and on the breakfast on use or your job or something like this but instead like your brain and conscious and then course thinking okay like I was twenty places per night like I mean I need to get back and then in generate some ideas you get this color came to this multitasking mode then in the evening instead of a mnemonic go doing sports or something like this you get excited and like focused you'll get some ideas you improve with some papers something works something doesn't and you get back your twenty places you get a lot of knowledge but your name wasn't very relaxed and then in the morning you wake up again and your game twenty places over because like everyone and this video book is doing exactly the same and when you need to repeat this procedure so you're in this constant there's no constant social and some other pressure for you to to study to boost in say it makes you less sleep it for sure somehow you get more energy and you've no excited then maybe some endorphins and dopamine and other hormones that boosting you but it's definitely stress right it's almost like a game then uh maybe not a game or Co execute sport competitive sport right so it's like you know in in some computer games people have some sports with like it's some real drama and who is weaning who is losing you know some real sports there's a lot of drama and computer games sports Championships that a lot of drama think about the schedule maybe not drama but there is definitely similar there any so again Russia many competitions here a similar to sports in some senses no I see so you said the computer vision was you know becoming more important in 2017 yeah definitely right and you also said that natural language processing is becoming more important today so let's see even two years ago when it has a mention language processing problem for most of them you don't mean deep learning you take some tf-idf SVM on top of this and you get pretty good solutions and when people try to apply some complex deep learning pipelines they work on the traditional machine learning on top of this but it was like a year or two there was like really like meaning important breakthroughs this Barret GPT to basically neural networks working really well and so it was the chugboats it boosted many other natural language processing related tasks so this field is still growing computer vision I wouldn't say stagnating it's like you know situating maturing and so more and more plication sits in the stage where more and more things is going into production most startups are built on this and ya know so they're plaintiff startups that build on top of computer vision technology but I don't know being a successful startup that's built on top natural language processing technology because technology's not mature enough for this in that prediction icing this industrial stage but right now situation is changing technology is getting better so two years ago chatbots you look around maybe not a fraud but definitely not the best like investment in that direction so it's two years ago some of our vladimer like we have this great product channel moscow hope work with us so say good luck with your life guys like everyone go there but right now story may be different and on natural language processing a lot of the great results a lot of breakthroughs in terms of like network in terms of data in terms of all the situations so this film looks very promising in the mouth no see do you think you wanna stay in computer vision or going to somewhere else there's something else say yeah what two three years ago I really wanted to be in the middle usually program right now I've got some kind of satisfaction from this and stuck how all this works I understand how organic technology works I understand how to apply computer vision to medical imaging other type of imaging of course like you can study and learn you think definitely but MPD will be this it may happen that at some point every shift is slightly different than me nice okay I think I have one last question which is what I always ask good is there anything else that you want to add to this conversation I don't know so I really like that you and other people told me that my like blog post was motivational you know so I would like also this interview to have similar flavors to motivate people to study we are talking about mushroom learning deep learning computer vision as I said I talked to my software engineering friends and they say oh my god like I prefer to do software engineering I would like to the machine learning but it's so complex all this math you need to have PhD no guys machine on any of these days you can get like out of the high school like do a lot of cool stuff you'd like first of all math and machine learning is not very advanced people think it's advanced on with respect to physics it's just like at most first year of the freshman year so math is not that hard and also the current tutorials and everything getting there is easily so if you're thinking or considering about going to mush or any field just start doing something marshal learning competitions is a great way to jump in but project something and don't postpone it for like later later typically means never just try to do it now and hopefully you'll like it and find a job in this field yeah why should be good okay great thank you so much splat thank you
Info
Channel: Towards Data Science
Views: 2,317
Rating: 4.9428573 out of 5
Keywords:
Id: 9NSm0SW9Eg0
Channel Id: undefined
Length: 37min 6sec (2226 seconds)
Published: Tue Mar 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.