Question

Develop model based on differentially expressed genes

0

Entering edit mode

18 months ago

Rob ▴ 170

Hi all,

When we want to develop a model by the gene classifiers (using machine learning algorithms), is it better to narrow the gene list and have a more clear pattern to be applied to the validation data set?

Why is this? if some of the excluded genes are important for the phenotype of interest, which one is prioritized? A good pattern of the developed model after removing the gene or keeping the gene in the model?

Thanks
Rob

Machine-Learning RNA-Seq gene-classifier • 828 views

ADD COMMENT • link 17 months ago by Rob ▴ 170

score 3 · Accepted Answer · 2023-07-11

3

Entering edit mode

18 months ago

Shred ★ 1.6k

You would remove zero or very low variance genes from the initial set to reduce input size and "noise": later, the feature selection strategy employed will assign lower or zero weights to genes not meaningful for the classification. Machine learning models learn what they see from the training set: it's very unlikely that a model could grasp something from the validation/test set if it has never seen it from training.

Read something more here https://scikit-learn.org/stable/modules/feature_selection.html