Boosting is a fairly simple variation
on bagging that strives to improve the learners by focusing on areas where
the system is not performing well. One of the most well-known algorithms
in this area is called ada boost. And I believe it's ada,
not ata because ada stands for adaptive. Here's how ada boost works. We build our first bag of
data in the usual way. We select randomly
from our training data. We then train a model in a usual way. The next thing we do, and
this is something different, we take all our training data and
use it to test the model in order to discover that
some of the points in here, our x's and our y's,
are not well predicted. So there's going to be
some points in here for which there is significant error. Now, when we go to build our
next bag of data, again, we choose randomly
from our original data. But each instance is weighted
according to this error. So, these points that had significant
error, are more likely to get picked and to go into this bag than any
other individual instance. So as you see, we ended up with
a few of those points in here and a smattering of all
the other ones as well. We build a model from this data and
then we test it. Now we test our system altogether. In other words, we've got a sort
of miniature ensemble here, just two learners. And we test both of them. We test them by inputting
again this in-sample data. We test on each instance and
we combine their outputs. And again we measure error
across all this data. Maybe this time these points
got modeled better, but there were some other ones up
here that weren't as good. And thus we build our next bag and
our next model. And we just continue this over,
and over and over again up until m or
the total number of bags we'll be using. So to recap, bagging,
when we build one of these instances, is simply choosing some subset of
the data at random with replacement, and we create each bag in the same way. Boosting is an add-on to this idea where
in subsequent bags we choose those data instances that had been modeled
poorly in the overall system before.