Question

Ranking genes based on importance rank from Random forest classifying model

0

Entering edit mode

3.5 years ago

pbigbig ▴ 250

Hi everyone,

I have gene expression data from 2 cohorts of Case and Control, The number of control is much more than Case (4 times more) I would like to run Random forest to select genes (features) that can strongly classify case vs control.

My plan is that, due to the abundance of control samples, I intend to run n times random sampling of Control cohort (Case cohort is kept the same), and obtain n lists of feature importance. The sum rank of those features can be used as a conclusive result.

Is this approach feasible and is there any previously published study that did the same? I am very new to machine learning, so detail explanation or suggestions are greatly welcomed.

Thank you very much.

selection random analysis feature forest transcriptome • 1.1k views

ADD COMMENT • link 3.5 years ago by pbigbig ▴ 250

score 1 · Answer 1 · 2021-11-29

1

Entering edit mode

3.5 years ago

official.profile ▴ 20

Approaches similar to the one you describe have been used before. For example in the article:

Feng, Z., Qu, J., Liu, X. et al. Integrated bioinformatics analysis of differentially expressed genes and immune cell infiltration characteristics in Esophageal Squamous cell carcinoma. Sci Rep 11, 16696 (2021)

authors employed so called robust rank aggregation algorithm. If you use R then there is a ready to use implementation of the algorithm, called RobustRankAggreg