Dear all,
I have analyzed an RNA-seq dataset containing 3 cases and 3 control samples, the average library size was about 60-70 million reads with the mapping rate of 85% almost for all samples. However, a reviewer believed the sample size is small and requested to calculate statistical power. As I read, power analysis is not common for RNA seq analysis. however, I used the ssizeRNA package (ssizeRNA_single function) to estimate the power. Based on the output, 20 samples in each group are required for achieving the power of 80%, and the related power for the present samples (3 samples in each group) is less than 10%. Considering the library size and mapping rate as well as the robustness of the RNA-seq method, I didn't expect such a calculated power. Could you please kindly share your explanation about the issue? or please kindly introduce an alternative package for power analysis?
Many thanks
Yeah, n=3 is underpowered, this is neither unexpected, nor a novel issue but pretty much (I would say) common knowledge. The Schurch paper has extensively performed power analysis in RNA-seq using real data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/
The reason that people usually do not do n=20 in routine experiments is time, money and feasability. That having said, it depends on your project and the claims you want to make. Investigating pertubation effects in cell lines or in littermate inbred mice followed by experimental confirmation of the major findings -- yes, you might come away with n=3. Claims in a clinical context using primary human samples, be it therapy efficacy, drug effects etc... there is probably no way you'd ever come away with n=3.
Can you elaborate what you work on and what the reviewer asked/wanted to see precisely. Or was it "more of a comment than a question"?
Its also worth noting that that Schurch et al conducting their analysis on an in-vitro yeast system, where replicates were in different cultures of the same clonal population, which is likely to have a much less variable than any human study, particularly clinical studies.
Thanks for your response. It was a common RNA-seq analysis to find differentially expressed genes between two groups of case and control, for human samples (blood). I agree with you, but we see many published papers in the clinical context with such a small sample size, so all of them have this big problem?!
Almost all RNAseq studies in the literature, other than those done by the large consortia, are probably underpowered. How big a problem is this is depends, as @ATPoint points out, on what they want to claim as the results.
Often RNAseq is used as a hypothesis generating technique. If you want to claim a particular expression profile is predictive, then you'd want to actaully test the profile on a set of underelated patients. If the conclusions was that a particular pathway was involved in a cellular phenotype, then the knocking out the pathway and measuring the effect would be the test. Once these tests of the generated hypothesis have been carried out, the power of the RNAseq experiment is irrelevant.
Further, while the power of the RNAseq to detect DE for any given gene is low, that does not mean the power of any downstream analysis is just as low. By aggregating information of many genes, things like enrichment analysis and profile generation benefit from taking information from many different genes, meaning their power will be higher.
See my posts on a similar topic here: A: Am I crazy, or are most published RNA-seq studies vastly underpowered?
I haven't read the ssizeRNA paper in details, but from figure 7.B of the article, it seems that the power estimation is a bit off for low sample size, at least for that example. Looks like the tool estimates ~10% when the real power is about ~40%.