How I built my best ML project without going crazy

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I recently went from having no idea on what machine learning project I wanted to work on to my best ml project ever that I even wrote a paper on in only four months hopefully it will one day be ready for a top conference but learning how to approach such I challenge took me a very long time but over the years and the course of this ml project I developed a three-phase system to help me find an idea all the way to actually building out this amazing ml project so in this video I will share with you this system so that you can come up with and build an amazing ml project organizing everything using notion who are also kindly sponsoring this video there are quite a few videos and articles online providing a list of beginner ml projects and don't get me wrong they are amazing and really useful for beginners but I want to work on a more unique and arguably more impressive project that had the potential to be made into an actual research paper I mean I always recommends to work on one or two bigger more challenging projects to really go beyond the beginner stage now how much bigger you want to go and what type of project you want to work on depends on your current skill level and what your goal job will be I for example want to be a researcher but either way we all start at the same point not knowing what to actually work on and this is where the first phase of my system comes into play where you don't touch any code for a while when I got started I had already explored different ml domains a bit over the past few years and ended up finding the topic of multimodal learning really exciting so I had a point to start with multimodal learning the issue was multimodal learning was a really large topic so I decided I want to go into Vision language modeling that being settled I start started reading a lot of papers on the cool multimodal large language models such as Flamingo Cosmos one lava and so on and that is where the next problem pops up it is completely unfeasible for me to train a full state-ofthe-art multimodal llm those things are trained across way too many gpus and on way too much data so the solution is to look at a Tighter and Tighter Niche I thought of video language modeling specifically video question answering and what do you think did I do then yep read more papers the goal here is not only to acquire a lot of knowledge learn about different techniques that you could use in your own project but most importantly finding a specific task meaning specific benchmarks and data sets and limitations in some of the current state-of-the-art models that you could improve after looking at video QA models I focused on one specific benchmark found the current best model on the leaderboard and found one of its limitations this then led me to Pivot into yet another slightly different direction looking at video moment retrieval and focusing on two to three of those respective benchmarks and that was where I finally found my place now one could argue that that is more of a research goal but if you want to do more practical ml engineering focused projects the general approach is honestly the same if you don't have an idea what you want to work on what problem you want to solve look at what other people are solving and make sure to pay close attention to the different techniques they use and that is where the next important part of this phase comes into play documenting this research process what isn't written down somewhere does not exist now I've been using notion for years now and I absolutely love it I've built a little machine learning dashboard with everything I need to organize my research and ml projects in general we'll get to almost each component but one part of this whole template is my paper Library I here add all of the most relevant papers I have really read and for example add important tags these tags would allow me to easily filter for certain types of papers depending on what I want to revisit later talking about revisiting later it also helps to add the links to the papers if he wants to have a more detailed look again but it would be even better if you write your own summaries and already starts to write out certain pros and cons now the next thing you can see here is a column called data sets which are really important to have an overview of that's why I have a whole separate database only for the data sets documenting and understanding the differences between different data sets and benchmarks is really important to figure out what problem you want to solve I here once again add relevant information like links to the data sets some important details that might get in handy in future but most importantly I again have the tags that indicate what tasks these data sets tackle if I want to work on image captioning I can see all the data sets I know of for image captioning by looking at the type column in the end for my project as already mentioned I found two to three moment retrieval data sets and two to three papers that are most relevant for my specific project this whole process of reading a lot of papers thinking I might have a first idea and scrubing that once again took me about two whole months and I read at least 20 papers but towards the end of this phase one the learning phase phase two starts to overlap a bit so let's have a look at what this phase two consists of now phase two is a quite short but really really important phase that again touches no code if you skip this phase the final coding phase will be a much bigger pain than it should have been so at some point in Phase One probably rather towards the end you will get to a point where you feel a bit more comfortable with the landscape of the problem setting you have narrowed down towards you have found more and more specific limitations that bigger more General models have and are starting to get ideas and as men mentioned what isn't written down somewhere does not exist that is why I have another much simpler database table in notion that is literally just for dumping your ideas I've even added a cool button to quickly dump them into the database and have more buttons for adding a new paper or data set now don't think too hard about those ideas just yet just write them down in a way that you will hopefully understand what you meant once you revisit the idea later this idea dumping happen already in phase one but is really important for phase two we now want to actually determine our most promising ideas and build a little plan at this point I would go through what I have written down and pretty much throw away most of the ideas but then combine two to three good ones since we're still moving on a high level we now want to break down our idea into Global but important steps or tasks you could also call these milestones for example I had looked at a few repositories and knew I want to start with simply running the code from one of my Baseline models that I wanted to later outperform and build upon I then not only wanted to just run their code to reproduce the results as per that tutorial but I really wanted to explore the details of the repository and see whether I can use it for my own idea I then figured I would probably want to download the data sets pre-process the data and see how it looks once I had an overview of the code base and had the data I should have all I need to start implementing the main modules for my basic idea then I want to focus on logging then I want to train my first model and so on building out this highlevel road map without overthinking it and assigning deadlines I had highlevel Global tasks I could directly get to work on of course each Global task requires subtasks but those are part of the next phase phase three but the amazing thing now is I will have way less anxiety when actually sitting down to code because I now have a good enough idea of what I need to work on instead of sitting down looking at the code and trying to figure things out on the go this phase too the ideating and planning phase took me only a few days but the idea of adding Global highlevel tasks will also carry over into phase three and will be very important for actually developing your idea finally we get to the potentially most stressful phase but the one where we really see why all stages of the system are so important when I finished my reading and ideating phase I finally got to work I had a fairly clear plan of what I needed to do and got so coding done it really felt like some speedrun I mean I had everything I needed to get into hustle mode and managed to write the biggest chunk of my code in only a few days the remaining weeks I ran a lot of experiments and implemented new smaller ideas and features into the model how well I broke down these Global tasks into smaller more immediate to-do items and could just check off one box after the other but that is not quite enough how did they get so much work done without losing the overview so I personally work on a week per week base at least when it comes to planning tracking and documenting everything I do of course you could also do daily Todo lists which sometimes are definitely useful but again for me I find that it is enough and more manageable to keep track of the tasks for one week for that I I again use notion where I have one more database for my weekly diary there are only three really important parts to one of my diary entries first there of course is the to-do list where I actively take the time to think through what I need to do to get to the next Milestone which you can easily peek at in the global tasks table view at the top I usually do this on Mondays when sitting down with my supervisors or alone and looking at the progress of the last week and what is still open this is so often overlooked but I promise you this helps you so much when you finally get to actual coding once I get to work I keep updating the list because during this week I of course find new problems I need to work on what I also might get during the week are new ideas and for those I have a special place to just dump them because again what isn't written down doesn't exist many of those ideas are bad and end up as zombie ideas but once I deem one of them really worthy of trying out I have even added some handy button to add the idea to the idea dump or can even directly add it as a new Global task at the top of the page finally since I'm trying out different things debugging running a lot of experiments and just need to keep track of a lot of bugs new features and as mentioned experiments I have one more section for the observations I make I think there's not too much to say about this section it's just a little log and scratch Pad but I found it to be really useful and I did get lost when I didn't use it and finally since there can be a lot in such a diary entry I also have this little summary section right here where it writes down the most important Milestones or developments but the important part is this system is still very Dynamic and as you hopefully saw allows me to add more to- do items ideas and larger Milestone tasks to the tables when they pop up during the experimentation and development process this Final Phase the coding phase can be really overwhelming if you don't have a plan what you want to work on you will sit in front of your screen in the dark alone and have no idea where to start or where to go next and waste a lot of time at least that is how it was for me in my early days but to be fair I often still get overwhelmed when spending too much time in phase three I feel like I'm not making any progress because I'm just trying out one simple change after the other at that point I love to take a step back and go back to phase one I give myself perhaps a relaxed week or two of purely reading papers learning new stuff and collecting new ideas I then go into phase two where I write out these ideas into milestones and can then get back to coding with a new Fresh mind and a new set of actually thought through steps to take and I feel like the notion dashboard I'm using really helps me with this constant cycle of learning planning documenting and implementing I here even have a place to drop in much more information on different projects you might be working on for example a table where you keep track of the best scores your model achieves compared to other state-of-the-art methods and just more but that is just an open canvas for you to write down whatever you need so if you don't have a notion account yet sign up to notion using my link in the description below and if you don't want to spend tedious 30 to 60 Minutes building this dashboard yourself you can also support this Channel and buy the template for only a few bucks using another link that you can find in the description bye-bye
Info
Channel: Boris Meinardus
Views: 12,675
Rating: undefined out of 5
Keywords:
Id: AyGzfGQqJgY
Channel Id: undefined
Length: 14min 24sec (864 seconds)
Published: Fri Jun 14 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.