Estimating Dispersions In Deseq: Which Is Best, All Conditions At Once, Or Each Condition Comparison Individually?
2
0
Entering edit mode
11.6 years ago
gaelgarcia05 ▴ 280

Hi everyone,

I have an RNA-seq dataset which comprises 4 different conditions, and 2 biological replicates per condition, like so:

              cond1_1        cond1_2      cond2_1     cond2_2      cond3_1      cond3_2      cond4_1      cond4_2   
gene1       
gene2
gene3
gene4
gene5

Currently, I have been performing the sizeFactors function, as well as the estimateDispersions function on each table of 2 conditions (4 samples) at a time (the comparison in turn). I make the data frame pertaining ONLY to comparison.X , then do estimateSizeFactors and estimateDispersions, then run the negative binomial Test on those results.

I am wondering, however, if it is best to supply DESeq with all the samples to estimateSizeFactors and estimateDispersions, and then run the paired-condition comparisons. Might this provide more information per gene, or would it be counterproductive?

Thanks, Carmen

deseq rna-seq r edger bioconductor • 5.7k views
ADD COMMENT
4
Entering edit mode
11.5 years ago

One way to compare what happens in either case is to use plotDispEsts().

If you pool conditions for variance estimation, you assume that the expression of only a few genes changes, but the vast majority does not, and therefore the different conditions are similar to biological replicates. In one of my datasets, however, I had conditions where big groups of genes were regulated, because I was looking at a very drastic response of the cells. In such a case, treating different conditions like replicates will overestimate the variance. Genes that are actually regulated by your condition will look like they have a high variance. As a consequence, the test for differential expression will be rather strict, and you will have a shorter list of regulated genes. You have to decide which way is better for you, depending on your data and on what you expect from your analysis.

ADD COMMENT
2
Entering edit mode
11.6 years ago

I believe the usual recommendation is to use all of your data for the dispersion estimation.

Also, I'd recommend checking out DESeq2, there are some nice new enhancements over DESeq.

ADD COMMENT
0
Entering edit mode

Thanks, Steve, I'll be sure t check it out!

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6