Custom Ensemble Approach To Solve Machine Learning Problems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  • Original Title: Custom Ensemble Approach To Solve Machine Learning Problems
  • Author: Krish Naik
  • Description: Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more ...
  • Youtube URL: https://www.youtube.com/watch?v=ZuT8QHQAKO8
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Mar 04 2020 🗫︎ replies
Captions
hello on my name is Krishna and welcome to my youtube channel today in this specific video we are going to discuss a custom and simple approach to solve machine learning use cases trust me guys nowadays this kind of approach are basically are commonly used by many of the companies by many of the data scientists and also tell you why this particular approach works in a proper way ok right we will try to divide this into three sections first of all let us just go and discuss about n symbol now I hope you probably may have worked with algorithm which are of n symbol kind now what is the example the best example of n symbol approach is basically two examples I'd like to say one is bagging the other one is something called as boosting now in bagging the example that I would like to take is something called as a random forest and in boosting I would like to specify as eggsy boost it maybe adaboost you know gradient boost and many more okay so in this bagging and boosting approach we basically say this particular algorithms as and symbol algorithms why understand in both the algorithms suppose if I take an example of random for its decision trees are used and not one decision to be multiple decision trees are used similarly in boosting also decision trees are used and again I am talking about multiple decision trees here parallel execution of the decision tree happens your sequential decision tree execution happens right so we basically call this as an n symbol approach and usually an symbol says that we can combine any number of models to get the final output and the final output is based on the voting classifier now what does this custom and symbol approach basically mean and why I am actually creating this particular video trust me guys my previous two project I have actually implemented this custom and symbol approach to get some very good accuracy you can also use this custom and symbol approach in casual competition in hacker rank and different kind of competent competitions also so let us try to understand with a very very good example so first of all let me consider that I want to predict T crop tea crop production how much it is going to produce I won't do that particular prediction I know this I have given in one of the use case guys were just try to understand okay I'm just not going to consider this later I'll give some more examples okay now in this suppose if I take an example okay now usually in tea crop prediction and understand guys whenever the data initially is collected it is collected in some Hadoop architecture so big data usually has some properties which is called as four V's okay you can actually go and search in the Google about four V's of big data but just understand one of the V I am going to specify the remaining three ways you can search and try to find it out and let me see how many you'll actually write and if you get the answer you type it down in the comment box it is pretty much simple so one of the V is basically called as variety now usually for any problem statement where you have variety of data now in this particular tea corrupt prediction say understand in India you have various region where tea crops are actually grown okay you have not dished specifically many parts of the Northeast where tea crops actually grow in South India also in various parts T cross is actually grow now understand one thing is that we need to find out the T crop prediction with the help of some of the features like what is the maximum temperature what is the maximum rainfall minimum temperature minimum rainfall average rainfall what is the soil condition humidity condition what type of soil is actually used all this kind of information now similarly in South India also this all information will be there and always remember based on the location this properties may vary a lot in the north east sometimes you get very good rainfall in the western parts we get a huge amount of rainfall again the soil may be different right so based on that we need to try to find out the T crop production of a specific region right now understand one thing over here there is a huge variety of data right there is just not single to same similar type of data so usually in most of the use cases that comes right it will be having a huge variety of data again okay whenever you are solving some gang of competitions also whenever you have some use case in your company also there'll be high use variety of data now one thing that to focus over here is that since I told you that this data is completely dependent on location so the first step when we say custom and symbol approach we usually use the combination of we usually use the combination of two approach that is called as clustering okay one one is basically clustering and the other one is basically called as supervised machine learning supervised machine learning and AI obviously this clustering basically means unsupervised so we take the combination of we basically take the combination of clustering that is an unsupervised machine learning algorithm and supervised machine learning algorithm now what does this basically cluster a new question can be anything guys k-means clustering hiatal mean clustering DBK scan clustering you can use any of them okay so suppose if I take this particular data I give it to my clustering algorithm I gave it to my clustering algorithm and you know that any clustering algorithm what I will get I'll get multiple groups suppose I get three groups my group one my group two and this is my group three okay so these all groups I've actually got from my clustering algorithm and you know that clustering basically this we are trying to cluster it right and this is completely unsupervised machine learning problem okay now when we are actually getting this three group what does this basically mean what does this basically mean this basically indicates that you have three kind of data okay based on locations various location that are actually present in India you know where the tea crops are actually produced they are having different means three different kinds of varieties of features okay like suppose if I consider some of the stairs districts in North East may fall away some of the district in South will may fall over here so similar kind of districts will be actually falling in this particular group okay so that is what it actually mentions you know and understand like this data will also be very very huge okay and we you cannot just depend on one single model such a huge huge amount of data to do the prediction so we should always try to follow this particular pattern and again I'll tell you some other disadvantages of this okay at the end I'll be saying you know some of the disadvantages but the advantages is more when compared to registered one there is so many of the companies and people and developers actually use this okay now once we are able to get this particular group and we know that in this group we have some specific amount of data this group based on this group we will be actually creating my model one my second group will be creating a model two and my current model will be basically creating a model three and now here is basically a supervised machine learning algorithm ok supervised machine learning algorithms may it may be random forests it may be Xu boost it may be adaboost any kind of algorithms can be actually created based on hyper parameter based on the data of that specific group ok so this is how this particular approach actually works first of all we take this whole data apply clustering find out how many feasible number of groups you are getting based on the number of groups we create that many number of supervised machine learning models okay pretty much simple and trust me guys my past two projects have used this particular approach it is working so well because the understanding here what I have done and usually in in a in a machine learning use case you have a huge amount of data not just a small amount of data and you cannot just be dependent on one simple model ok 1ml model you cannot be dependent on that there'll be lot of factors that will be actually affecting the machine learning model so usually for any use cases if you are able to follow this particular approach they definitely give you a good result ok and trust me about this guys this is pretty much amazing you'll basically understand when you're doing this ok and recently one of the project that I did for my members who have actually joined my UD channel there I've actually followed this particular architecture to solve a problem ok and that particular project name is phishing classifier ok so this is how we actually solve this particular stuff ok now for the new data also what will happen for the new data first it will pass through the clustering algorithm then it will go and say that whether it belongs to group 1 group 2 group 3 suppose belongs to group three then we'll go and hit the m3 model and from this we'll get the response will get the output we will get the output okay so this is how a custom and symbol approach works pretty much simple guys pretty much simple we have just compiled unsupervised machine learning technique and a supervised machine learning technique that also in multiple groups now we will try to discuss about the disadvantages understand aware guys whenever we have one machine learning model we have to put a lot of things a lot of efforts to make that machine learning model scalable but now understand here we are having one two three four machine learning models in this approach and usually if you aren't trying to focus on to this and try to solve this particular problem I can also do it one machine learning model as a as many of them actually do it right but by following this particular approach for the scalability thing I have to manage for four different machine learning models right now each and every machine learning model when it is deployed in the production it it is basically for web api is right which is actually exposed to some front end user right it may be a web app mobile app and many more things so understand there is lot of difficulty in managing so many models and we have to create a separate pipeline for each and every model so that we can train our models continuously understand once the model is actually deployed it is always verified you know how good it is actually performing and monthly bimonthly it will be actually checked up what is the accuracy that we are actually getting then after some time we again retrain our model and for this there will not only be one pipeline because we have four different kind of details we have oh here if I consider three groups we have three different kind of data in this particular case so it is pretty much important that the major disadvantage is custom and symbol approach is that we need to manage a lot of thing okay so this this was a pretty much explenation about the custom and symbol approach trust may try to do try to solve any problem by using this particular approach okay and not only this guy sir let me consider one more example which is called as phishing classifier phishing classifier okay there are a lot of use cases which you can do okay why I am saying you this a particular approach is that guys because you are actually splitting that particular data based on that specific group of data you're actually creating these models now this particular model based on this particular data will perform well right because it has it is focused on this particular kind of group so this is what is all about this particular video so I hope you like this particular video please do subscribe the channel if you have not already subscribed I'll see y'all in the next video have a great day thank you one and all bye bye
Info
Channel: Krish Naik
Views: 2,338
Rating: undefined out of 5
Keywords: data science, machine learning, deep learning, ensemble techniques, simplilearn, great learning
Id: ZuT8QHQAKO8
Channel Id: undefined
Length: 11min 38sec (698 seconds)
Published: Mon Mar 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.