I've been diving into some journal articles about the DESeq2 model and I'm a bit puzzled about the bit on statistical testing of the log fold change. From what I gather, the log fold change (after shrinkage) and the standard error obtained from the empirical Bayesian approach are then using Wald's test to calculate the p-value. They do this by dividing the shrunken Log Fold Change estimate by its standard error to get a z-statistic.
I am wondering if my following statement is correct:
The log fold change doesn't always follow a normal distribution, right? And since Wald's test is based on the assumption of a normal distribution, doesn't that mean this kind of statistical testing is mainly for when the sample size is large because of the Central Limit Theorem.
What about when the sample size is small? Wald's test might not be the best fit in that case, and the likelihood ratio test becomes a better choice.
I am also hesitant about what can be considered a large sample size. Normally, the number of replicates does not qualify.
Thanks in advance for any insight you can provide!
Generally speaking, the major differential expression packages (limma, DESeq2, edgeR) are all slightly different flavors of linear models; and as such the statistics associated with any particular contrast are derived from multiplying (inner product) the model coefficients by a contrast vector. For the very simplest case (one binary condition variable) the contrast vector is of the form (1, 0, ..., 0). The resulting product is itself a random variable whose mean and standard deviation can be derived from the multivariate normal distribution that arises from the null model under the central limit theorem. While you can interpret the coefficients of the linear model as log fold changes, that interpretation is irrelevant to their use as a statistic. To wit: the estimate of log fold change under the null hypothesis is normally distributed via the central limit theorem. The empirical distribution of fold changes for a specific condition, across all genes, does not enter into any consideration for the linear model; and only enters into consideration for Bayesian or penalized approaches (such as empirical bayes in limma or shrinkage in DESeq2).
Much has been written on the vexed topic of sample size and probably the most relevant is Schurch 2016 (PMID:27022035). One apparent property -- apparent at least from the Schurch results -- of the Wald statistics used (which may or may not be shared with the asymptotically-equivalent likelihood ratio, score, or C(a) tests) is that violation of the CLT due to small sample sizes has little impact on the ability to control Type-1 error but instead it results mainly in a loss of power. I would therefore hesitate to switch from already-established methods, particularly in the case of low sample number, as these methods are known to control false-discoveries for low numbers of samples.
Thanks for your help! The journal you referred to is very helpful.