Hi All,
I am working on RNAseq data analysis using DESeq2 R package.
I havemy code dds <- DESeqDataSetFromMatrix(countData = count.mat , colData = cond, design = ~Strain + Time)
to create the matrix.
My confusion is with the design formula design = ~Strain + Time
where I have Strain
and Time
variables to compare in my colData for my countData matrix.
This is my cond
matrix
Strain Time
count1 1 1
count2 1 1
count4 1 2
count5 1 2
count13 2 1
count14 2 1
count16 2 2
count17 2 2
countX 2 3
First,
I want to understand the difference between these four designs that could go in the function DESeqDataSetFromMatrix
:
a) `design = ~Strain + Time + Strain:Time`
b) `design = ~Strain + Time`
c) `design = ~Time`
and d) `design = ~Strain`
Second,
My understanding is that the DESeq2 takes the last variable in the design formula (here Time) as a control variable, so to test for different samples in Time group, I have these codes below. So, I want to know what the outputs of resultNames(ddsTC)
really mean?
ddsTC <- DESeqDataSet(dds, ~ Strain + Time ) ##For time
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~Time ) #For Time
resultsNames(ddsTC)
[1] "Intercept.1" "Time_3_vs_1.1" "Time_2_vs_1.1" "Strain_2_vs_1.1"
Hey, To the best of my knockledge A)
design = ~Strain + Time
means that deseq2 will test the effect of the Time (the last factor), controlling for the effect of Strain (the first factor), so that the algorithm returns the fold change result only from the effect of time. B)design = ~Time
here the algorithm will return the fold change that result from time without correcting for fold change that result from strain C)design = ~Strain
same as aboveSo in my understanding Deseq2 treats the first factor as a co-variate and tries to eliminate the fold change that result because of this co-variate.
Sorry to up this message but when we apply the design " design = ~Strain + Time ", how exactly deseq2 control the effect of Strain in order to test the Time ? It's not clear for me ..
It's part of the GLM, that's how.
Thank you so much for your answer. So what does
design = ~Strain + Time + Strain:Time
mean? Also, do you know what are the outputs ofresultsNames(ddsTC)
comparing?Regarding 'design = ~Strain + Time + Strain:Time` ,
Here you added an interaction term (how time is interacting with stain in relation to regulation of gene expression). So this design will return the effect of time on the reference level of strain (1 or 2 depends on your setting). Using contrast () you can look for the effect of time on the other level in "strain"
Alternatively you can group the strain (with its different levels) and time ((with its different levels) into one factor, lets call it ALL. and by using contrast () you can look for the difference in log2 fold change between any combination of levels.