Question

Difference Between "Pooled" And "Blind" In Deseq Dispersion Estimates?

4

Entering edit mode

12.2 years ago

Mikael Huss 4.8k

Does anyone know what the difference is between dispersion estimates obtained using method="blind" vs method="pooled" in the estimateDispersions() function in DESeq? (I haven't migrated to DESeq2 yet; using DESeq 1.10.1.

From reading the vignette and reference manual, I get the impression that both of these methods estimate a single dispersion estimate for each gene, disregarding the particular experimental condition for each sample. But there must be other differences, why else have them as separate options? Looking at example code, it seems like method="blind" goes together with sharingMode="fit-only" (for DE analysis without replicates), but I wonder if that is a misinterpretation from my side.

deseq differential-expression • 10.0k views

ADD COMMENT • link updated 12.1 years ago by Biomonika (Noolean) 3.2k • written 12.2 years ago by Mikael Huss 4.8k

1

Entering edit mode

Maybe with pooled the dispersion derived via the regression is considered together with the higher evaluated dispersions caused by outliers (sharingMode="maximum" is usable) while with blind only the fit is considered (and only fit-only is a suitable choice for sharingMode)?

ADD REPLY • link 12.1 years ago by vodka ▴ 80

0

Entering edit mode

Yes, maybe that's it. Thanks for the suggestion

ADD REPLY • link 12.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

I rapidly checked the code and there are some differences...apart from some checks about the existence of replicates with the pooled method. I will explore the code more deeply as soon as I can, but as a first try to clarify things I would like to visually compare dispersion plots derived with the two methods.

ADD REPLY • link 12.1 years ago by vodka ▴ 80

0

Entering edit mode

Maybe it's something like the difference between the standard "pooled variance" (http://en.wikipedia.org/wiki/Pooled_variance) vs the "normal" variance. Thanks a lot for checking.

ADD REPLY • link 12.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

The graphs are indeed different. If and when I manage to understand more about this issue I will report here. Thanks for the link!

ADD REPLY • link 12.1 years ago by vodka ▴ 80

score 3 · Answer 1 · 2013-03-27

3

Entering edit mode

12.1 years ago

Biomonika (Noolean) 3.2k

From documentation:

pooled - Use the samples from all conditions with replicates to estimate a single pooled empirical dispersion value, called "pooled", and assign it to all samples.

per-condition - For each condition with replicates, compute a gene's empirical dispersion value by considering the data from samples for this condition. For samples of unreplicated conditions, the maximum of empirical dispersion values from the other conditions is used. If object has a multivariate design (i.e., if a data frame was passed instead of a factor for the condition argument in newCountDataSet), this method is not available. (Note: This method was called “normal” in previous versions.)

blind - Ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition. This can be done even if there are no biological replicates. This method can lead to loss of power; see the vignette for details. The single estimated dispersion condition is called "blind" and used for all samples.

Hope this helps.

ADD COMMENT • link 12.1 years ago by Biomonika (Noolean) 3.2k

1

Entering edit mode

I have read this in the documentation - should have stated that in the question, sorry about that - and yet it's not clear to me exactly what the difference is between "using the samples from all conditions with replicates to estimate a single pooled empirical dispersion value" and to "ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition".

ADD REPLY • link 12.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

Reading the manual description carefully it emerges that "pooled" refers to samples that have biological replicates. The "blind"method instead is applied to samples with no biological replicates. I guess the amount of samples is here also an important factor for the normalization

ADD REPLY • link 9.2 years ago by kristina.gagalova • 0