Question

RNA-seq replicates pooled before sequencing. How to proceed with DGE analysis?

3

Entering edit mode

9.1 years ago

Sentinel156 ▴ 190

I'm helping a colleague analyse RNA-seq data to find differentially expressed genes. There are 4 conditions with 3 biological replicates each and we are interested in all possible pairwise comparisons. Unfortunately they made a big mistake by pooling RNA from each of the three biological replicates and sequencing as a single sample (i.e. they did not individually barcode each replicate). My colleague was using DESeq1 and was able to generate a list of diff expressed genes by analysing the data assuming no biological replicates. I encouraged using DESeq2 however this results in no differentially expressed genes being identified. I explained that without knowledge of gene expression variation its unlikely that anything can be done to statistically improve these results.

My question is, is there any technique/alternate method of analysis that the community could suggest? Or is their experiment essentially ruined?

Thanks

RNA-Seq • 4.6k views

ADD COMMENT • link updated 9.1 years ago by Irsan ★ 7.8k • written 9.1 years ago by Sentinel156 ▴ 190

4

Entering edit mode

I would immediately reject it, if I were reviewing it for a journal. There is no way to estimate the sample-to-sample variance within each condition and the false positive rate is likely to be high.

ADD REPLY • link 9.1 years ago by dario.garvan ▴ 520

4

Entering edit mode

While this criticism is valid and the OP seems to be aware of it, I think it sounds a bit too harsh. That data could still be used to generate hypotheses, genes with large fold change could be validated by qPCR in multiple samples, a pathway analysis could still reveal something and suggest further experiments. If the replicates happen to be very consistent then this dataset, while not conclusive, might be valuable. Sometimes the replication is done at the level of cell culture where replicates are so similar that doing no replicates or many is not that different, unless you are after tiny changes. (Just to be clear, I'm not advocating to avoid replication, just that once the damage is done...)

ADD REPLY • link 9.1 years ago by dariober 15k

0

Entering edit mode

You could try GFOLD.

ADD REPLY • link 9.1 years ago by GouthamAtla 12k

score 6 · Answer 1 · 2016-03-29

6

Entering edit mode

9.1 years ago

Irsan ★ 7.8k

Read section 2.1.1 of the edgeR manual: "What to do if you have no replicates". You can compile a list of 250 housekeeping genes based on this article and estimate the "random/backbround" variation. And afterwards tell your colleagues to first consult an expert before starting experiments.

ADD COMMENT • link 9.1 years ago by Irsan ★ 7.8k

0

Entering edit mode

I'm late for this story, I read the edgeR manual for this kind of analysis. However, I need more explanation, could you please tell me how to estimate the "random/background" variation" based on housekeeping genes?

ADD REPLY • link 8.7 years ago by seta ★ 1.9k