Question

Should we rarefy our amplicon sequencing data?

0

Entering edit mode

5.4 years ago

songzewei ▴ 10

On Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible's figure 1: https://journals.plos.org/ploscompbiol/article/comment?id=10.1371/annotation/043bcfb2-1583-41a8-9497-807232f001f4

Am I the only one to think that Fig 1 actually shows the oppsite conclusion?

The stat is significant due to larger sampling effect on sample B. After adjusting the sampling effect, we no longer have the false positive.

On the other word, if random sampling is inadmissible, what if I sequenced sample B twice. One time I got 50, 50, and the other time I got 5000, 5000. How should I interpret the totally differnt stat outcome if random sampling is not applied?

sequencing • 1.6k views

ADD COMMENT • link 5.3 years ago by songzewei ▴ 10

0

Entering edit mode

It is true that with more reads, we have larger statistic power.

But how should we deal with the uneven stat power among samples with different depth?

Comparison between two depth seqenced samples will have a larger statistic power than that between two shallow samples. Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?

ADD REPLY • link 5.3 years ago by songzewei ▴ 10

0

Entering edit mode

But how should we deal with the uneven stat power among samples with different depth?

The authors argue one could use edgeR or DESeq2 (which account for differences in library sizes) to analyse microbiome data:

Fortunately, we have demonstrated that strongly-performing alternative
methods for normalization and inference are already available. In
particular, an analysis that models counts with the Negative Binomial
– as implemented in DESeq2 [13] or in edgeR [41] with RLE
normalization – was able to accurately and specifically detect
differential abundance over the full range of effect sizes, replicate
numbers, and library sizes that we simulated (Figure 6).

Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?

Of course one can sequence again to balance all library sizes to an appropriate sequencing depth, but this costs time and money. Using more powerful analysis methods is cheaper and faster.

ADD REPLY • link 5.3 years ago by h.mon 35k

score 0 · Answer 1 · 2019-08-07

What do you mean by "larger sampling effect"?

If one has larger samples, statistical tests have more power. Hence, in the Figure 1 example, when testing the rarefied counts there is no difference, but when testing with the original counts, there is a statistically significant difference - it is showing a false negative when using rarefied data.

You interpret the "totally differnt stat outcome if random sampling is not applied" by considering the statistical power associated with sample sizes, which means there is no paradox about a test yielding positive results with larger sample sizes, and negative results with smaller sample sizes.