Repeated testing/data mining in RNA Seq

0

Entering edit mode

7 months ago

robert.flynn.21 ▴ 10

Hi all,

I have got an RNA Seq data set comprising samples from a cohort of patients with rare disease (and controls). The clinical presentation among these patients is varied and there are many genetic subtypes of the disease within my cohort.

In addition, to my regular case/control analysis, my supervisor has provided me with 6+ additional tests to run. Mainly these tests involve comparing different subtypes of the disease.

However, I am aware that if you run enough tests you will eventually find something interesting. As such, I am wondering what the protocol is here? Is this fine? Or is it entering into p hacking territory?

Many thanks,

Rob.

repeated RNA-Seq mining testing data • 368 views

ADD COMMENT • link 7 months ago by robert.flynn.21 ▴ 10

1

Entering edit mode

As such, I am wondering what the protocol is here?

My personal suggestion is to just do it and see what comes out. Multiple comparisons are often necessary to build a hypothesis. If you do overly stringent correction for many comparisons you might lose potentially interesting aspects. RNA-seq findings are usually the start for downstream analysis which then should confirm the finding. I would not be so worried.