Boosting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Boosting is a fairly simple variation on bagging that strives to improve the learners by focusing on areas where the system is not performing well. One of the most well-known algorithms in this area is called ada boost. And I believe it's ada, not ata because ada stands for adaptive. Here's how ada boost works. We build our first bag of data in the usual way. We select randomly from our training data. We then train a model in a usual way. The next thing we do, and this is something different, we take all our training data and use it to test the model in order to discover that some of the points in here, our x's and our y's, are not well predicted. So there's going to be some points in here for which there is significant error. Now, when we go to build our next bag of data, again, we choose randomly from our original data. But each instance is weighted according to this error. So, these points that had significant error, are more likely to get picked and to go into this bag than any other individual instance. So as you see, we ended up with a few of those points in here and a smattering of all the other ones as well. We build a model from this data and then we test it. Now we test our system altogether. In other words, we've got a sort of miniature ensemble here, just two learners. And we test both of them. We test them by inputting again this in-sample data. We test on each instance and we combine their outputs. And again we measure error across all this data. Maybe this time these points got modeled better, but there were some other ones up here that weren't as good. And thus we build our next bag and our next model. And we just continue this over, and over and over again up until m or the total number of bags we'll be using. So to recap, bagging, when we build one of these instances, is simply choosing some subset of the data at random with replacement, and we create each bag in the same way. Boosting is an add-on to this idea where in subsequent bags we choose those data instances that had been modeled poorly in the overall system before.
Info
Channel: Udacity
Views: 227,309
Rating: undefined out of 5
Keywords:
Id: GM3CDQfQ4sw
Channel Id: undefined
Length: 2min 24sec (144 seconds)
Published: Mon Jun 06 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.