Random Forest importance scores
1
0
Entering edit mode
7.1 years ago
ab123 ▴ 50

Hello there, In using randomForest, I understand that the Mean Decrease Gini output shows me the most important variables. However, mine are tiny. E.g. 0.043 to 0.003

Does it make sense establish the Gini cutoff at > 0 to identify top genes or metabolites for example?

The other question is: does it even make sense to use RF for very simple classifications e.g. only gender, as well as few samples (e.g. n = 16)?

Thanks!

randomforest R • 1.4k views
ADD COMMENT
1
Entering edit mode
7.1 years ago

The decrease in Gini impurity measures how useful a variable was in computing splits. A useful variable would give a large decrease and a "neutral" variable would not increase nor decrease the Gini index. So values close to 0 would suggest the variable are not important or redundant in obtaining good classification. As to the question of whether it's appropriate to use random forest on your data, you'd have to tell us more about what the data is and what the question you're trying to answer is. In general, random forest can make sense in the case where there are more variables than samples. However, for data with few variables, it could be better to use something else, e.g. a regression.

ADD COMMENT
0
Entering edit mode

Great answer! Thank you! In my case, the data is indeed few samples, but lots of observations and a simple check for class differences (gender). I've compared the RF to linear regression and some other supervised methods and found it to be very conservative if I go by the Gini scores (perhaps 200 just barely above 0). I'm looking to identify the top differentially expressed metabolites.

ADD REPLY

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6