Question

RNA-seq power estimation using ssizeRNA program

1

Entering edit mode

3.8 years ago

seta ★ 1.9k

Dear all,

I have analyzed an RNA-seq dataset containing 3 cases and 3 control samples, the average library size was about 60-70 million reads with the mapping rate of 85% almost for all samples. However, a reviewer believed the sample size is small and requested to calculate statistical power. As I read, power analysis is not common for RNA seq analysis. however, I used the ssizeRNA package (ssizeRNA_single function) to estimate the power. Based on the output, 20 samples in each group are required for achieving the power of 80%, and the related power for the present samples (3 samples in each group) is less than 10%. Considering the library size and mapping rate as well as the robustness of the RNA-seq method, I didn't expect such a calculated power. Could you please kindly share your explanation about the issue? or please kindly introduce an alternative package for power analysis?

Many thanks

power analysis RNA-seq ssizeRNA • 2.0k views

ADD COMMENT • link updated 3.8 years ago by Kevin Blighe 88k • written 3.8 years ago by seta ★ 1.9k

0

Entering edit mode

Yeah, n=3 is underpowered, this is neither unexpected, nor a novel issue but pretty much (I would say) common knowledge. The Schurch paper has extensively performed power analysis in RNA-seq using real data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/

The reason that people usually do not do n=20 in routine experiments is time, money and feasability. That having said, it depends on your project and the claims you want to make. Investigating pertubation effects in cell lines or in littermate inbred mice followed by experimental confirmation of the major findings -- yes, you might come away with n=3. Claims in a clinical context using primary human samples, be it therapy efficacy, drug effects etc... there is probably no way you'd ever come away with n=3.

Can you elaborate what you work on and what the reviewer asked/wanted to see precisely. Or was it "more of a comment than a question"?

ADD REPLY • link 3.8 years ago by ATpoint 85k

1

Entering edit mode

Its also worth noting that that Schurch et al conducting their analysis on an in-vitro yeast system, where replicates were in different cultures of the same clonal population, which is likely to have a much less variable than any human study, particularly clinical studies.

ADD REPLY • link 3.8 years ago by i.sudbery 20k

0

Entering edit mode

Thanks for your response. It was a common RNA-seq analysis to find differentially expressed genes between two groups of case and control, for human samples (blood). I agree with you, but we see many published papers in the clinical context with such a small sample size, so all of them have this big problem?!

ADD REPLY • link 3.8 years ago by seta ★ 1.9k

1

Entering edit mode

Almost all RNAseq studies in the literature, other than those done by the large consortia, are probably underpowered. How big a problem is this is depends, as @ATPoint points out, on what they want to claim as the results.

Often RNAseq is used as a hypothesis generating technique. If you want to claim a particular expression profile is predictive, then you'd want to actaully test the profile on a set of underelated patients. If the conclusions was that a particular pathway was involved in a cellular phenotype, then the knocking out the pathway and measuring the effect would be the test. Once these tests of the generated hypothesis have been carried out, the power of the RNAseq experiment is irrelevant.

Further, while the power of the RNAseq to detect DE for any given gene is low, that does not mean the power of any downstream analysis is just as low. By aggregating information of many genes, things like enrichment analysis and profile generation benefit from taking information from many different genes, meaning their power will be higher.

See my posts on a similar topic here: A: Am I crazy, or are most published RNA-seq studies vastly underpowered?

ADD REPLY • link 3.8 years ago by i.sudbery 20k

0

Entering edit mode

I haven't read the ssizeRNA paper in details, but from figure 7.B of the article, it seems that the power estimation is a bit off for low sample size, at least for that example. Looks like the tool estimates ~10% when the real power is about ~40%.

enter image description here

ADD REPLY • link 3.8 years ago by Carlo Yague 8.9k

score 2 · Answer 1 · 2021-01-31

Post edit: see related answer by Ian: A: Am I crazy, or are most published RNA-seq studies vastly underpowered?

----------

If we consider the Student's t-test and ignore, for now, the fact that this is RNA-sequencing (with all of its intricacies...), then it is possible to achieve 80% (0.791) Power with a 3 vs 3 comparison when:

we set alpha to 0.05
the expression of each gene has standard deviation of 1.0 in each group
the difference in mean between the TRUE differentially expressed genes is ⩾ 3

--------

Given the nature of biology, it's unlikely that we'll have any gene with that 'tight' a standard deviation; however, we can still achieve sufficient Power (0.791) with 3 vs 3 if:

we set alpha to 0.05
the expression of each gene has standard deviation of 3.0 in each group
the difference in mean between the TRUE differentially expressed genes is ⩾ 9

------------

So, this is in line with the statements made in the study to which my colleague, ATpoint, has linked. That is, with 3 vs 3, you'll only realistically be able to detect a certain percentage of the TRUE differentially expressed genes, i.e., those genes that have large effect sizes / difference in mean.

Keep in mind that it is impossible to apply these stats globally to the entire transcriptome, because different genes will exhibit a different level of variation in expression profiles in different tissues, and will respond differently to treatments, also based on tissue-specific, cell-cycle, and both genetic and epigenetic differences. The effect sizes will also differ wildly.

Kevin