I am doing a DESeq2 comparison with different levels and one factor. To do this, I have performed the analysis in two different ways.
First, putting all the samples in the same DESeq object and then extracting each comparison:
> sampleinfo
FileName SampleName Status
A_1_count A_1 A
A_2_count A_2 A
B_3_count B_3 B
B_4_count B_4 B
C_5_count C_5 C
C_6_count C_6 C
D_7_count D_7 D
D_8_count D_8 D
E_9_count E_9 E
E_10_count E_10 E
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = sampleinfo,
design = ~ Status)
dds$Status <- relevel(dds$Status, ref = "E")
And the results:
dds <- DESeq(dds)
res_A <- results(dds,name="Status_A_vs_E")
res_B <- results(dds,name="Status_B_vs_E")
res_C <- results(dds,name="Status_C_vs_E")
res_D <- results(dds,name="Status_D_vs_E")
And doing these comparisons one by one separately on different DESeq objects.
> sampleinfo_A
FileName SampleName Status
A_1_count A_1 A
A_2_count A_2 A
E_9_count E_9 E
E_10_count E_10 E
> sampleinfo_B
FileName SampleName Status
B_3_count B_3 B
B_4_count B_4 B
E_9_count E_9 E
E_10_count E_10 E
dds_A <- DESeqDataSetFromMatrix(countData = cts_A,
colData = sampleinfo_A,
design = ~ Status)
dds_B <- DESeqDataSetFromMatrix(countData = cts_B,
colData = sampleinfo_B,
design = ~ Status)
And the results:
dds_A <- DESeq(dds_A)
res_A <- results(dds_A)
dds_B <- DESeq(dds_B)
res_B <- results(dds_B)
(Repeat for each condition)
However, the results give me different between the 2 methods. Does anyone know why is this happening? How it is the correct way to compare all to E?
Thank you!
If I get better estimates of dispersion, is the variance better reflected in the gene expression for a given mean value? So you think it is a better way to compare all groups vs E?
Generally speaking, that would be the best strategy. There are only few cases where splitting the dataset before variance estimation might be the best strategy (see ATpoint answer) .
Thank you! Very helpful :)