Hi all! I am writing to you because I need help/advice on a DESEq2 analysis that I'm performing.
I have 4 different knockdowns (A,B,C,D) and 2 different chromatin fractions (short, long). I want to compare results within each chromatin fraction, let's say A is my mock KD, so B,C,D vs A in long fraction and separately B,C,D vs A in short fraction. I have 2 different options when it comes to design my analysis in DESeq2.
First option is to have everything in the same design matrix.
# Build DESeq2 object
dds = DESeqDataSetFromMatrix(countData = toDE,
colData = meta,
design = ~group)
where group = meta$group = paste0(meta$Knockdown,"_",meta$Fraction)
I first relevel vs dds$group = relevel(dds$group, ref = "Long_A")
to extract my results for long fraction.
I do the analysis and then relevel
vs the short fraction.
dds$group = relevel(dds$group, ref = "Short_A")
to get results in the short fraction.
My second option is to split the counts and metadata into long and short fractions and therefore do the analysis separately.
# Build DESeq2 object
dds = DESeqDataSetFromMatrix(countData = toDE[,colnames(toDE) %in% rownames(meta)[meta$Fraction=="Long"]],
colData = meta[meta$Fraction=="Long",],
design = ~Knockdown)
then repeat the same analysis for the short fraction.
I have performed both approaches. My problem is that the amount of significant genes is way different depending on which approach I use.
Number of significant genes for first approach:
B_Long 278
B_Short 103
C_Long 98
C_Short 47
D_Long 396
D_Short 129
Number of significant genes for second approach:
B_Long 549
B_Short 218
C_Long 33
C_Short 6
D_Long 1108
D_Short 435
So my question is, which approach would you follow? The joint one despite the fact I don't want to compare fractions? Or the second one where I do two completely different analysis??
Thanks before hand,
Jordi
My samples cluster totally independent long and short fraction in a PCA, that's why I was wondering whether I have a situation as the one mentioned in the vignette... See the plot here