Am I the only one to think that Fig 1 actually shows the oppsite conclusion?
The stat is significant due to larger sampling effect on sample B. After adjusting the sampling effect, we no longer have the false positive.
On the other word, if random sampling is inadmissible, what if I sequenced sample B twice. One time I got 50, 50, and the other time I got 5000, 5000. How should I interpret the totally differnt stat outcome if random sampling is not applied?
It is true that with more reads, we have larger statistic power.
But how should we deal with the uneven stat power among samples with different depth?
Comparison between two depth seqenced samples will have a larger statistic power than that between two shallow samples. Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?
But how should we deal with the uneven stat power among samples with different depth?
The authors argue one could use edgeR or DESeq2 (which account for differences in library sizes) to analyse microbiome data:
Fortunately, we have demonstrated that strongly-performing alternative
methods for normalization and inference are already available. In
particular, an analysis that models counts with the Negative Binomial
– as implemented in DESeq2 [13] or in edgeR [41] with RLE
normalization – was able to accurately and specifically detect
differential abundance over the full range of effect sizes, replicate
numbers, and library sizes that we simulated (Figure 6).
Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?
Of course one can sequence again to balance all library sizes to an appropriate sequencing depth, but this costs time and money. Using more powerful analysis methods is cheaper and faster.
If one has larger samples, statistical tests have more power. Hence, in the Figure 1 example, when testing the rarefied counts there is no difference, but when testing with the original counts, there is a statistically significant difference - it is showing a false negative when using rarefied data.
You interpret the "totally differnt stat outcome if random sampling is not applied" by considering the statistical power associated with sample sizes, which means there is no paradox about a test yielding positive results with larger sample sizes, and negative results with smaller sample sizes.
It is true that with more reads, we have larger statistic power.
But how should we deal with the uneven stat power among samples with different depth?
Comparison between two depth seqenced samples will have a larger statistic power than that between two shallow samples. Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?
The authors argue one could use edgeR or DESeq2 (which account for differences in library sizes) to analyse microbiome data:
Of course one can sequence again to balance all library sizes to an appropriate sequencing depth, but this costs time and money. Using more powerful analysis methods is cheaper and faster.