I'm helping a colleague analyse RNA-seq data to find differentially expressed genes. There are 4 conditions with 3 biological replicates each and we are interested in all possible pairwise comparisons. Unfortunately they made a big mistake by pooling RNA from each of the three biological replicates and sequencing as a single sample (i.e. they did not individually barcode each replicate). My colleague was using DESeq1 and was able to generate a list of diff expressed genes by analysing the data assuming no biological replicates. I encouraged using DESeq2 however this results in no differentially expressed genes being identified. I explained that without knowledge of gene expression variation its unlikely that anything can be done to statistically improve these results.
My question is, is there any technique/alternate method of analysis that the community could suggest? Or is their experiment essentially ruined?
Thanks
I would immediately reject it, if I were reviewing it for a journal. There is no way to estimate the sample-to-sample variance within each condition and the false positive rate is likely to be high.
While this criticism is valid and the OP seems to be aware of it, I think it sounds a bit too harsh. That data could still be used to generate hypotheses, genes with large fold change could be validated by qPCR in multiple samples, a pathway analysis could still reveal something and suggest further experiments. If the replicates happen to be very consistent then this dataset, while not conclusive, might be valuable. Sometimes the replication is done at the level of cell culture where replicates are so similar that doing no replicates or many is not that different, unless you are after tiny changes. (Just to be clear, I'm not advocating to avoid replication, just that once the damage is done...)
You could try GFOLD.