Question

Is reusing a subset of features in another layer of classification wrong or over-fitting?

0

Entering edit mode

5.4 years ago

Floydian_slip ▴ 170

Hi, I am using a machine learning code written by somebody else where they run the first level of classification using 10 features. Then they select some instances which were not classified with certainty (between a certain range of output value) and rerun them again with a smaller subset of those 10 features but different classifiers not used before (eg., SVM but different model of it). Then they do this one more time with another subset and different classification models. So, some of these features are used in 3 different rounds. What are the pitfalls of this approach, if any? Is this over-fitting in any way (not in a traditional sense of course). Will this approach be criticized if we launched with this?

Thanks!

machine learning overfitting overtraining • 1.1k views

ADD COMMENT • link updated 5.4 years ago by Mensur Dlakic ★ 29k • written 5.4 years ago by Floydian_slip ▴ 170

score 0 · Answer 1 · 2020-03-15

0

Entering edit mode

5.4 years ago

Mensur Dlakic ★ 29k

There is nothing that sounds like overfitting here as long as you use the same folds for each classifier. But it does sound unnecessary to do it this way. This is essentially what boosting does by differentially weighting in next iteration what was misclassified in previous. I presume the reason they are using different features is to create non-overlapping expertise between individual classifiers, which is a good idea in general. Still, gradient boosting trees do all of that automatically, including feature selection, so you could save yourself some time and probably come up with a better classifier in the end by simply going with one of classifiers that uses (extreme) gradient boosted trees. As much as I like SVMs - for historic and practical reasons - I haven't done a single project in the past 10 years (at least hundreds) where SVMs outperformed boosted trees, either in terms of training speed or classification/regression performance. Unless you have a small dataset where proper classifier calibration is essential, I can't imagine that in your case it would be any different.

ADD COMMENT • link 5.4 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Thanks, Mensur. I do realize that this is not over-fitting in the strict sense of the word. My main concern is: can this approach be criticized and be found inappropriate in any way after we go to the market with it? Moreover, SVM was just an example of the classifier used; others like NN and linear regression are also used. I ran gradient boosting on the initial dataset but the performance did not reach as high as this 3-step approach does. What could explain that? This difference in performance made me worried about the risk of this approach doing something which could be later torn apart. Any insight will be helpful. Thanks again!

ADD REPLY • link 5.4 years ago by Floydian_slip ▴ 170

0

Entering edit mode

I ran gradient boosting on the initial dataset but the performance did not reach as high as this 3-step approach does. What could explain that?

If you are making an ensemble of three different classifiers and comparing that to gradient boosting alone, it is possible that the former would do better. That's the whole point of ensembling - getting individually inferior classifiers to produce a superior ensemble. That would be my interpretation without knowing the details of your procedure: 1) how many folds; 2) whether you are carrying the same folds through all stages; 3) what classifiers are used; 4) what is the difference between individual classifiers and the ensemble; 5) do you use all data points but give higher weights to the ones that were misclassified? As long as you do it properly, this procedure should not overfit.

ADD REPLY • link 5.4 years ago by Mensur Dlakic ★ 29k