Is genotype
not in your design formula, too, and should that not be included as an additional interaction term with treatment
?
I think that Example 2 from ?results
should fit your needs in most cases, no?
## Example 2: two conditions, two genotypes, with an interaction term
dds <- makeExampleDESeqDataSet(n=100,m=12)
dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))
design(dds) <- ~ genotype + condition + genotype:condition
dds <- DESeq(dds)
resultsNames(dds)
# Note: design with interactions terms by default have betaPrior=FALSE
# the condition effect for genotype I (the main effect)
results(dds, contrast=c("condition","B","A"))
# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))
# the interaction term, answering: is the condition effect *different* across genotypes?
results(dds, name="genotypeII.conditionB")
To get this right, though, you need to have your factor reference levels set correctly.
The other way to get what you want is to create a new variable, called Group
, that encodes both the treatment
and genotype
, and then re-run DESeq2 with that.
These types of questions are probably the most asked here and on Biocondctor forum.
------------------------
In all honesty, though, I prefer to keep these things as simple as possible, unless you have very large sample n. If 'simple' involves generating multiple results tables and comparing them 'manually', then so be it.
Kevin
Dear Kevin
Thank you for your advice.
I also studied the presented example 2 in ?results, but there are only 2 conditions in that example (while I have three)? So, it is not clear to me, how I should correctly integrate the controls (C) for each genotype (1 or 2).
FYI, my sampleTable looks the following:
Also, I combined treatment and genotype into a new variable called group, and then you can indeed easily compare E1vsU1 or E2vsU2, but how are the C samples then taken into account, and how can you statistically compare E1vsU1 with E2vsU2?
My apologies for my ignorance.
Regards
Wannes
I think that you can do it without the grouping variable, and instead via an interaction term. For example:
control
is set as the reference level - this is very important.Here,
Treata.geno2
is essentially the genotypic effect in Treatmenta
vscontrol
. So, to get the comparison that you want, we compare this toTreatb.geno2
:Take a look at Example 3 from
?results
(at the very end).Thank you very much Kevin!
I was able to integrate your solution, and results seem to match with biological expectations.. I do have one trivial question.
Within the same analysis (with all samples loaded into the dds), I was wondering how you should obtain the results for the comparison of geno2 vs geno1, but only for the "control" treatment? I tried
But these results are different (twice the number of DEGs) compared to an analysis where I only use the control data (GSM1275875 and GSM1275874 in your example) into the DE dataset.
If I am not mistaken, that should be represented by
geno_2_vs_1
; so, :This is elaborated in the following paragraph of the vignette:
When I am in doubt about these design formulae, I usually spot check some results via box-and-whisker plots just to be sure. Generating quick plots via base boxplot() is easy:
Dear Kevin
Thank you for the suggestions
did give me the same results as
Still twice the number of DEGs compared to the analysis only including control samples, but I assume the extra DEGs are due to the fact that more samples have been included in the analysis (more statiscal power, see e.g. also question herehttps://support.bioconductor.org/p/101190/?
Thank you for suggesting the vignette, but I'm having a hard time understanding these interaction terms. For example, I also would like to do the following contrasts:
treatment a vs b for genotype 1
and
treatment a vs b for genotype 2
Again I am able to do this in a separate analysis (only analyzing samples for genotype 1 or only analyzing samples for genotype 2), but not when all samples are analyzed together (samples for genotype 1 and 2 loaded in the DE dataset).
Thank you in advance.
Hey, it is 'better' to normalise the entire dataset together, but I think that it is okay to go back to use a different design formula such that you can conduct all comparisons that you need. By normalising all samples together, irrespective of the design formula, the data will always be normalised in the same way. If you remove samples, however, the size factors will change; thus, normalised values will also differ.
Ok thank you.
If you would know the formule for a vs b for genotype I or for a vs b for genotype II, it would be welcome. This way I can compare how big the difference is when comparing all samples together or a selection of samples only.
Sent from TypeApp
Well, for the genotype comparison in just controls (or another group), use the 'grouping' variable in your design formula:
Then:
Thank you, but I was referring to the ~ Treat + geno + Treat:geno design.
How can you compare the two treatments (a vsb) for one genotype using that design.