Question

Random forest with DE genes

0

Entering edit mode

21 months ago

jmoon1194 • 0

Hi all,

I have some rna-seq data with 2 classes (cancer/normal) that I ran DESeq2 on to obtain significant DE genes. My lab is interested to see how well our 'significantly DE' genes can classify cancer/normal samples and is using the AUC score/plot to viz the performance.

Does it make sense to train the random forest with the same pre-defined list, instead of using a feature selection method? Can this artificially inflate the AUC scores if used with LOOCV?

Please let me know your thoughts, questions or concerns you may have. I am fairly untrained and want to learn as much as I can (but am under pressure to deliver with no guidance/mentorship).

Thank you for your time, J

RandomForest RNA-seq R • 887 views

ADD COMMENT • link updated 21 months ago by dsull ★ 6.8k • written 21 months ago by jmoon1194 • 0

score 2 · Accepted Answer · 2022-12-07

2

Entering edit mode

21 months ago

dsull ★ 6.8k

Yes, you should use your pre-defined list since you want to see how well those genes can classify tumor vs normal.

You can use LOOCV for validation but you should test your classifier on an unseen dataset (i.e. which you have not already run deseq2 on and have not looked at previously).

All that said, not sure what you'd gain from such an analysis; you already have your deseq2 results which tell you which genes are, on average, higher (or lower) in tumor vs. normal. Those genes being good at classifying tumor vs. normal is not that outstanding of a result.

ADD COMMENT • link 21 months ago by dsull ★ 6.8k

0

Entering edit mode

Thank you for your reply! I greatly appreciate it. I agree with you on the last point of this not being very informative... but gotta do as I'm told for now :/

One last question, for testing the classifier on an unseen dataset- can it be any tumor/normal tissue dataset? Or does it have to be from the same tissue type (parathyroid in this case).

ADD REPLY • link 21 months ago by jmoon1194 • 0

1

Entering edit mode

Ideally the same tissue type; your classifier is unlikely to perform well on a different tissue type (though you can try). Generally, training on apples and testing on oranges does not yield good results.

ADD REPLY • link 21 months ago by dsull ★ 6.8k