I am relatively new to RNAseq and i don't yet fully understand the statistics involved in differential expression (DE) analysis. I have read quite a few publications where the DESeq2 package is used for DE-analysis of metatranscriptome datasets.
What confuses me in this context is the shrinkage estimation of dispersion. The DESeq2-Paper reads:
In DESeq2, we assume that genes of similar average expression strength have similar dispersion.
Is this a valid assumption for metatranscriptome-counts? An extreme example to illustrate the issue:
Organism A
-occurs with low abundance (average of 1 organisms per sample)
-has a high transcription rate of gene X (average of 1000 reads counts per organism in the sample)
Organism B
-occurs with high abundnace (average of 1000 organisms per sample)
-has a low transcription rate of gene y (average of 1 read count per organism in the sample
I would imagine gene x and gene y show a similar average read count of 1.000, but exhibit very different dispersions. Did i maybe miss something about how dispersion is calculated? Or does the issue perhaps not matter for real world datasets? Thanks in advance for any answers.
Cheers, Tom