Dear all,
I'm dealing with and RNA-seq experiment with a total of 11 different conditions.
I'd like to do pairwise comparisons between some of them (without having a fixed base/ctrl condition). To do that, I'm planning to use the Wald-test.
After setting conditions (n=11), this is the command I used (in this example I want to compare condition C to E)
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = colData,
design = ~ 0 + condition)
dds <- DESeq(dds)
res_condC_versus_condE = results(dds, contrast=c("condition","condC","condE"))
I've two main questions:
- Do you think the design formula and the results command are set in the correct way?
- Should I use lfcShrink only for data visualization and ranking??
Thank you for your time!!
Best
Marianna
Hi Kevin,
thank you for your reply!
If I use
~ condition
, then the first level of the factor "condition" is fixed as a control/base condition and all the other levels will be compared to this control condition. How to deal with that?? I thought the only solution was to remove the intercept, so I can set the DE analysis between any condition. Am I wrong?? Are there alternative solutions to easily compare the different levels without fixing a base condition?As far as
lfcshrink
, I assume I can use it instead ofresults(dds)
.Thank you Marianna
Thank you
When you specify what to compare to what with contrast, it doesn't matter what the base level is for your factor.
Ah!, no, you can leave the intercept in the model (just use
~ condition
) and then do any pairwise comparison irerspective of the reference level, as I show, here: A: DESeq2 compare all levelsIn that thread, I also show how you can use
lfcshrink()
Thank you Kevin, really appreciated
Marianna
Hi Kevin,
I did the analysis as you suggested. I noticed that the log2fold changes are REALLY different if using
lfcShrink
.For example for the most DE gene I got a log2FC of 4 and 0.05 using lfcShrink and results, respectively. Do you expect such a big difference??
Best Marianna
I am going to 'randomly' hypothesise that the gene in question has generally low expression in your data? Genes with low expression are quite problematic because they can appear to have inflated fold changes.
For example:
Although the first one has a fold change (linear) of 18, is it meaningful when considering that the expression of both genes in question is 0.9 and 0.05?
Hi! Sorry to dig up this old thread, but it felt more appropriate to expand the discussion here than posting a new question. Could you please clarify why should the intercept be left in the model? For some reason I just can't wrap my head around linear model intercepts in DE tools (or in general). In the example posted above I kind of see why having an intercept is fine since the contrast argument does not really use the intercept anywhere (I might be way off on this though...), so you just specify your pairwise comparisons with contrast. But following the same logic, why would having a design ~0 + condition be any different then (e.g. they don't use an intercept in this tutorial either https://www.biostars.org/p/461026/)? Or maybe the other way, when would removing an intercept be appropriate in DESeq2?
What prompts me to ask is that I tried all various re-levelling and w/wo intercept designs using contrasts to see what happens (study design like in the above post), the pairwise comparisons come back basically identical (same nr of significant DE and highly similar LFCs) , except for one single dreadful gene that is driving me nuts. (yes, it is a noisy and low expressing one, but now that I found this I can't un-find it. It also happens to gain the highest significant LFC in one of the comparisons, which is just completely off). The only approaches where this seems to be handled appropriately is when I do a no intercept model for the pairwise comparisons or for each pairwise comparison I re-level the factor first for the appropriate comparison, even though I am using contrast like posted here.
An intercept is useful when you want to think in terms of a factorial design. You then don't need contrasts, since you implicitly have one. If you have something like
condition
with a bunch of levels and want to do pair-wise comparisons then you dropping the intercept is convenient, since you quite explicitly don't have a factorial design (i.e., something like WT vs. Mutant and Treated vs. Untreated).You can generally perform the same comparisons with/without an intercept, it's primarily a matter of convenience whether to include one or not.
Hi, Devon! Thanks a lot for the reply. Okay, so the intercept in DESeq2 is basically only used if you have what you said a factorial design. Used to specify what is the 'baseline' you are comparing against - and this is applied when you use results(dds, name=) instead of contrast, because for a contrast you specifically say which is the baseline you want to test against? Would it not be more straightforward to always explicitly use contrasts without an intercept even for a factorial design or what is the advantage of an intercept in that case? I think I am missing some key concept, I've been doing my best to get through this http://genomicsclass.github.io/book/ for the linear models and I kind of understand it, but also kind of don't at all (at least the intercept).
Thanks again for your reply, it reassures me that at least I am allowed to drop the intercept for the pairwise comparisons.
It's just a matter of how you prefer to think of your experiment. A very common experimental design in biology is a 2x2 factorial design, such as WT/Mut and Treated/Untreated. If I use a model of
~genotype + treatment + genotype:treatment
then I directly assess the biological questions I care about: what's the effect of the genotype, the mutation, and their interaction. If I exclude the intercept I need to make groups ofgenotype_treatment
and then use something like~0 + condition
followed by constructing a bunch of contrasts. Those contrasts may be easy for someone like me to construct, but the average wet lab biologist has a near 0 change of doing that correctly. The common retort is that dropping the intercept and using~0 + condition
allows you to do pairwise comparisons between groups. That's nice and all, but those rarely tell me anything biologically worthwhile. So this comes down to a question of what you want to get out of your experiment and how to most easily do so.Try doing that with very complex experimental designs. You can do it, but it's not exactly fun.
Thanks again, Devon, this is very helpful!
I see your point and that the intercept can be useful for more complicated designs than simple condition_A vs condition_B. I think to start with I find contrasts more intuitive (maybe even for complex designs) or I might also just be misleading myself. I wonder where this massive gap in my knowledge comes from, I need to create an 'Intercept for Dummies' guide for myself and soldier on with my simple comparisons for now.