Why it is recommended to run samples from all groups together while performing DESeq although the design variables are not used when estimating the size factors?
Why it is recommended to run samples from all groups together while performing DESeq although the design variables are not used when estimating the size factors?
The recommendation has to do with ensuring that no additional external factors affect the sequencing process.
Minor changes in the protocol, sequencing efficiency, library preparation, temperature of the day, person doing the sequencing can introduce "systematic errors". Unlike random errors, systematic errors do not get easier to correct with increasing sample sizes. The opposite is true in fact. Systematic errors get to become more relevant with increasing sample sizes.
The more samples and the more data is collected the tinier systematic errors can show up as true signal.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Gene variance is estimated using all samples, so the more samples you have the more accurate the estimate.
Do you mean that DESeq2 uses a specific measure of dispersion (α) related to the mean (μ) and variance of the data: Var = μ + α*μ^2. Based on the dispersion is higher for small mean counts and lower for large mean counts so have more samples is better? did I understand well? Thanks