Develop model based on differentially expressed genes
1
0
Entering edit mode
16 months ago
Rob ▴ 170

Hi all,

When we want to develop a model by the gene classifiers (using machine learning algorithms), is it better to narrow the gene list and have a more clear pattern to be applied to the validation data set?

Why is this? if some of the excluded genes are important for the phenotype of interest, which one is prioritized? A good pattern of the developed model after removing the gene or keeping the gene in the model?

Thanks
Rob

Machine-Learning RNA-Seq gene-classifier • 790 views
ADD COMMENT
3
Entering edit mode
16 months ago
Shred ★ 1.5k

You would remove zero or very low variance genes from the initial set to reduce input size and "noise": later, the feature selection strategy employed will assign lower or zero weights to genes not meaningful for the classification. Machine learning models learn what they see from the training set: it's very unlikely that a model could grasp something from the validation/test set if it has never seen it from training.

Read something more here https://scikit-learn.org/stable/modules/feature_selection.html

ADD COMMENT
0
Entering edit mode

Thank you Shred

ADD REPLY

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6