What is AdaBoost (BOOSTING TECHNIQUES)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is Krishna and welcome to my youtube channel today we are basically going to see what is the boosting technique and we are going to see the first algorithm which is called as adaboost algorithm in the previous video I have already shown you how bagging techniques actually work and I also taken an example of random forest classifier and regression so let's go ahead and try to understand what is boosting now let us understand how boosting techniques work now let us consider that this is our data set now what we do is that we create based learners sequential so first of all initially some of these particular records will get passed to this particular base learner and this base learner may be any models right now now once it is trained okay after training what we will do is that will pass all these records and we'll see how this particular model has basically performed now suppose consider that this record this record and this record was current one incorrectly classified or incorrectly classified right so what will happen now if it is incorrectly classified the next model that is created sequentially only this records will get passed to the next model which is basically created sequentially which is my base learner - okay so when this particular record is passed to the base learner - that basically means most of this particular record will get trained with respect to this particular base learner and then simultaneously suppose this particular base learner has given some more wrong records like this over here then this will be error will get passed to the next base learner and this will be problems and this will be going on continuously so this is my base learner 3 this will be going on continuously and it will go on unless and until we specify some number of base learners that we want to create and this is the process of how a boosting technique works now better but when we discuss about adaboost adaboost is little bit different there is something called as waits will get assigned over here and that we are going to discuss right now so let us begin the discussion regarding adaboost let us take an example suppose in our data set I have features F 1 F 2 F 3 and 1 output ok so suppose I have multiple data sets like this so suppose consider that I have 7 records ok I have 7 records so in the first step will happen is that this all records will get some sample wait so I'm going to create another column which is called as sample wait now why we require this sample wait I just tell you no one so initially in order to assign the sample wait the formula basically is applied W is equal to 1 by n now when I say W is equal to 1 by n this is nothing but n is nothing but the number of records so I basically have 1 by 7 initially all the records are basically assigned the same weights ok all the records are basically assigned the same weights now this was about step one in the step two what we do is that we create our first base learner which we need to create sequentially so our first base learner we will be creating with the help of decision trees in order boost over here all the base learners are decision trees now when we are talking about decision trees it is not like how we create a decision trees and random forest instead over here the decision tree is just created with the help of only one depth this decision trees are basically called as stumps so whenever you consider that if a decision tree is just having one depth with two leaf node or it may be having many number of leaf nodes these are basically called as stumps now what we do is that we consider f1 and we create one stump then we consider f2 then we create an another stump for each and every feature will be creating this stump ok so and finally for F 3 also we create this particular stop but from this particular stump I have to first select my first decision tree based learning model okay and how do I select it we have two properties called as entropy entropy or gimme coefficient we can use both of them if the entropy is less for this particular stump then we are basically going to select this particular decision tree as my based learning model for the first sequential based learning model itself so similarly I'll compare the entropy of f1 of this particular stop once term to stump three whichever will be having the lesser one I'll be selecting that decision tree as my based Lona model now in the second step suppose this is my selected stump that I have and suppose this has curved correctly classified for records okay and incorrectly classified only one regard okay so suppose it has correctly classified for records and incorrectly classified as one record how it is basically classifying my output basically has yes or no I'm just considering a binary classification over here okay so suppose it has correctly classified for classification correctly and one incorrect classification then what we have to do is that for this internet classification we have to find out the total error okay how do we find out the total error that we'll check so suppose this particular record has been incorrectly classified okay so we calculate the total error total error by summing up all the sample weights so in this case I just have one error so my sample total error will basically be 1 by 7 now this is your second step where we have actually calculated total error now in the step 3 what we do is that we try to find out the performance of the stump that basically means how the stump has basically classified in order to do that we have a formula called as 1 by 2 log to the base e 1 minus total error divided by total error so what we are going to do is that we are going to take this total error okay so once I take this total error it will be log of e base e 1 divided by 1 by 7 divided by 1 by 7 so this will nothing be it will be 1 by log e to the base 6 ok so the log of e multiplied by Bay 6 so this particular value will be somewhere around 0.896 okay so this is how you calculate the performance of the stump now you must be thinking why did I calculate total error and this performance of stump that is because we need to update this weight as I told you in the boosting technique what happens is that only the wrong records from this particular decision tree one or the stop one will be passed to the next decision tree or the next stump so for that what I have to do is that I have to increase the weights of the wrong classified records whereas I have to decrease the weights for the correctly classified records now in the fourth step we need to update the weights we know that our performance of that particular stump is 0.895 so what we are going to do is that based on this particular performance we are going to update the weights now for updating the weights there are two simple formulas first of all will try to update the incorrectly classified points so in order to update the weights of the incorrectly classified records what we are going to do is that we will be using a very simple formula the formula can be basically given by a new sample weight is equal to old weight multiplied by e to the power of performance say the performance is basically this particular value now what we do is that we know our what is our previous sample weight so we have 1 by 7 multiplied by e to the power of 0.895 right the output is 0.34 9 and you can also calculate with the help of calculator now just understand guys this is this formula is basically to update the weight of the incorrectly classified point so over here initially my weight was 1 by 7 now you can observe that your initial weight was 1 by 7 now it has got updated to 0.34 9 which is basically increased right so this is the formula for updating the incorrectly classified points but for updating the correctly classified points we just have to make a simple change in the formula and the formula will just be e to the power of minus performance' and when we do the same formula for this here when I get minus 0.895 this output is basically point zero five that's it so you can understand that now my updated weights look like Oh point zero five zero point three four nine point zero five zero point zero five point zero five and so on so I hope these are seven records one two three four five six seven perfect now this is the step where we have actually found out or updated weights for this particular record but we should observe one more thing guys that when I do the summation of all this particular weight the total value is not one but in the case of sample weight when I did the summation of all these values we usually get one so for this what we do is that we divide by a number which is the summation of all these values so if I do the summation of all these values it will somewhere come around approximately 0.68 okay now when I actually divide all these values with the help of 0.68 that time will basically be getting our normalized values now when we divide by 0.68 we will basically get winged our normalized weight now suppose I divided Oh point zero five by 0.68 I'm getting point zero seven similarly when I am dividing point three four nine with 0.68 I'm getting point one five one three and similarly all these values will get populated here I'll again get point zero seven point zero seven point zero seven and similarly point zero seven over here and when I do the submission of all these values that value will be equal to one now what will be the next step we'll discuss about it now I have removed the sample weight and the updated weight and instead of that I've just taken the normalized weights now considering this normalized weight what we will do is that in our next step we will create a new data set and that data set based on this particular updated values will most probably select the wrong records for restraining purpose so whenever we create our second decision tree which is basically much stung so let us understand how do we create that new data set itself so what I'll do is that I will take that same data set a new data set F 1 F 2 F 3 and this is basically my output now what will happen is that based on this normalized weight we will try to divide this into buckets so the first bucket will be 0.0 0.0 0.0 7 and then this will be from point zero seven two point now if I add point zero seven to this particular value it will be somewhere around 0.5 eight and similarly this will be continuing on 0.58 two point six five then point six five to 0.72 and similarly it will go on now after that what it will do is that our algorithm will run eight iterations to select different different records from this particular holder data set so suppose for in the first iteration it has selected a random value of 0.4 3 now it'll go and see which bucket it falls into suppose it falls into this bucket the wrong record understand guys this was my wrong record right which was incorrectly classified my by decision tree one so what I do is that I'll select this record and populate it over here okay then again for the second nitration suppose eyes got selected as 0.31 now i'll go and see where this point 3-1 fall into the bucket and again it falls into this particular bucket where my long oh man why the court was actually classified incorrectly so i'm going to take this particular record again and similarly this will be going on okay now the probability when all this eight records will be getting selected most of the time the wrong record will also get selected now this is our new data set that I have now based on this particular new data set I will create my new decision tree stump now when I am creating the new decision tree stump again it will use f1 create a separate stump f2 create a separate stop f3 create the separate stuff and then based on the entropy select which entropy is very very less for which decision tree stomp it will go and select that and again the same process will be continuing suppose from that decision tree we found out again two records have been incorrectly classified so what will happen again this normalized weight will be getting updated suppose it has identified this suppose this record has got incorrectly classified then all the steps will again restart where we had updated this weight first of all will go and find out what is the total error then we'll be trying to find out how the model has actually performed that is basically my performance say of the second decision tree stump and simultaneously the whole process will be going on again and again and after that you'll also be seeing that after we calculate the total error and performance say the error weights will also get updated which is my normalized weight that will also get updated and after updation again it will be normalized so that process will be continuing on unless and until it passes through all the sequential decision trees and finally you'll be considering that there will be a less error when compared to this normalized weight that we had in the initial stages now suppose with respect to our data set we had constructed decision tree one decision tree two decision tree three which are my basically my stumps in sequential manner and how for the test data the classification will happen you know so suppose I have a test data set it will get passed to this particular record and suppose this is a binary classification suppose this gives us one this gives us zero this gives us one okay you know that as random forest how the majority would basically happens similarly in adaboost the majority would basically happens between the stumps now in short over here you can see that we are combining weak learners we are combining weak learner then we are actually making it as a strong learner okay and when multiple weak learners combines they become a strong learner in short now this is all about this particular video I hope you understood what is adaboost techniques understand all the steps the weight updation is the majority step that will basically happen in my next video I will be explaining about gradient boosting technique I hope you like this particular video please do subscribe the channel if you have not already subscribed I'll see y'all in the next video have a great day thank you one at all [Music]
Info
Channel: Krish Naik
Views: 126,790
Rating: 4.8596239 out of 5
Keywords: Boosting, AdaBoost
Id: NLRO1-jp5F8
Channel Id: undefined
Length: 14min 6sec (846 seconds)
Published: Sat Aug 31 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.