Question

DESeq2 contrasts multiple treatments two genoptypes

0

Entering edit mode

4.7 years ago

wd • 0

Hi

I have an experiment with three treatments (E,U and control (C)) and 2 genotypes (1 and 2). I have 3-4 reps for each conditions and each genotype.

I know how to do the comparison (E-C vs U-C) for each genotype separately (with "design = ~ treament" and treatment column containing E, C or U). And then I look at the overlap/no overlap between both DEG lists (DEGs E1vsU1 and DEGs E2vsU2).

However, I think there is a better (statiscal) way would to analyze E vs U changes depending on genotype. In other words, I would like to do the following comparison:

(E1-C1/U1-C1) vs (E2-C2/U2-C2)

which should yield me those genes in the EvsU comparison for genotype 1 that are differentially expressed compared to EvsU comparison for genotype 2.

Thank you in advance.

DESeq2 contrasts multiple treatments genotype • 4.8k views

ADD COMMENT • link 4.7 years ago by wd • 0

score 3 · Answer 1 · 2020-03-08

3

Entering edit mode

4.7 years ago

Kevin Blighe 88k

Is genotype not in your design formula, too, and should that not be included as an additional interaction term with treatment?

I think that Example 2 from ?results should fit your needs in most cases, no?

 ## Example 2: two conditions, two genotypes, with an interaction term

 dds <- makeExampleDESeqDataSet(n=100,m=12)
 dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

 design(dds) <- ~ genotype + condition + genotype:condition
 dds <- DESeq(dds) 
 resultsNames(dds)

 # Note: design with interactions terms by default have betaPrior=FALSE

 # the condition effect for genotype I (the main effect)
 results(dds, contrast=c("condition","B","A"))

 # the condition effect for genotype II
 # this is, by definition, the main effect *plus* the interaction term
 # (the extra condition effect in genotype II compared to genotype I).
 results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

 # the interaction term, answering: is the condition effect *different* across genotypes?
 results(dds, name="genotypeII.conditionB")

To get this right, though, you need to have your factor reference levels set correctly.

The other way to get what you want is to create a new variable, called Group, that encodes both the treatment and genotype, and then re-run DESeq2 with that.

These types of questions are probably the most asked here and on Biocondctor forum.

------------------------

In all honesty, though, I prefer to keep these things as simple as possible, unless you have very large sample n. If 'simple' involves generating multiple results tables and comparing them 'manually', then so be it.

Kevin

ADD COMMENT • link 4.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin

Thank you for your advice.

I also studied the presented example 2 in ?results, but there are only 2 conditions in that example (while I have three)? So, it is not clear to me, how I should correctly integrate the controls (C) for each genotype (1 or 2).

FYI, my sampleTable looks the following:

  sampleName    treatment   genotype
    X1  E   1
    X2  E   1
    X3  E   1
    X4  E   1
    Y1  U   1
    Y2  U   1
    Y3  U   1
    Y4  U   1
    Z1  C   1
    Z2  C   1
    Z3  C   1
    Z4  C   1
    Q1  E   2
    Q2  E   2
    Q3  E   2
    Q4  E   2
    R1  U   2
    R2  U   2
    R3  U   2
    R4  U   2
    S1  C   2
    S2  C   2
    S3  C   2
    S4  C   2

Also, I combined treatment and genotype into a new variable called group, and then you can indeed easily compare E1vsU1 or E2vsU2, but how are the C samples then taken into account, and how can you statistically compare E1vsU1 with E2vsU2?

My apologies for my ignorance.

Regards

Wannes

ADD REPLY • link 4.7 years ago by wd • 0

1

Entering edit mode

I think that you can do it without the grouping variable, and instead via an interaction term. For example:

colData(airway)
DataFrame with 8 rows and 5 columns
                 Name    Treat     geno
             <factor> <factor> <factor>
SRR1039508 GSM1275862        a        1
SRR1039509 GSM1275863        a        1
SRR1039512 GSM1275866        a        2
SRR1039513 GSM1275867        b        1
SRR1039516 GSM1275870        b        1
SRR1039517 GSM1275871        b        2 
SRR1039520 GSM1275874  control        1
SRR1039521 GSM1275875  control        2

control is set as the reference level - this is very important.

dds <- DESeqDataSet(airway, design = ~ Treat + geno + Treat:geno)
dds <- DESeq(dds, betaPrior=FALSE)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

resultsNames(dds)
[1] "Intercept"          "Treat_a_vs_control" "Treat_b_vs_control"
[4] "geno_2_vs_1"        "Treata.geno2"       "Treatb.geno2"

Here, Treata.geno2 is essentially the genotypic effect in Treatment a vs control. So, to get the comparison that you want, we compare this to Treatb.geno2:

results(dds, contrast = list("Treatb.geno2", "Treata.geno2"))

Take a look at Example 3 from ?results (at the very end).

ADD REPLY • link 4.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you very much Kevin!

I was able to integrate your solution, and results seem to match with biological expectations.. I do have one trivial question.

Within the same analysis (with all samples loaded into the dds), I was wondering how you should obtain the results for the comparison of geno2 vs geno1, but only for the "control" treatment? I tried

results <- results(dds,contrast=c("geno","2","1")).

But these results are different (twice the number of DEGs) compared to an analysis where I only use the control data (GSM1275875 and GSM1275874 in your example) into the DE dataset.

ADD REPLY • link 4.7 years ago by wd • 0

0

Entering edit mode

If I am not mistaken, that should be represented by geno_2_vs_1; so, :

results(dds, name = 'geno_2_vs_1').

This is elaborated in the following paragraph of the vignette:

The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms genotypeII.conditionB and genotypeIII.conditionB give the difference between the condition effect for a given genotype and the condition effect for the reference genotype.

When I am in doubt about these design formulae, I usually spot check some results via box-and-whisker plots just to be sure. Generating quick plots via base boxplot() is easy:

boxplot(gene ~ treatment,
  data = data.frame(treatment = colData(dds)$treatment, t(assay(rld))

ADD REPLY • link 4.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin

Thank you for the suggestions

results(dds, name = 'geno_2_vs_1')

did give me the same results as

results <- results(dds,contrast=c("geno","2","1"))

Still twice the number of DEGs compared to the analysis only including control samples, but I assume the extra DEGs are due to the fact that more samples have been included in the analysis (more statiscal power, see e.g. also question here https://support.bioconductor.org/p/101190/?

Thank you for suggesting the vignette, but I'm having a hard time understanding these interaction terms. For example, I also would like to do the following contrasts:

treatment a vs b for genotype 1

and

treatment a vs b for genotype 2

Again I am able to do this in a separate analysis (only analyzing samples for genotype 1 or only analyzing samples for genotype 2), but not when all samples are analyzed together (samples for genotype 1 and 2 loaded in the DE dataset).

Thank you in advance.

ADD REPLY • link 4.7 years ago by wd • 0

0

Entering edit mode

Hey, it is 'better' to normalise the entire dataset together, but I think that it is okay to go back to use a different design formula such that you can conduct all comparisons that you need. By normalising all samples together, irrespective of the design formula, the data will always be normalised in the same way. If you remove samples, however, the size factors will change; thus, normalised values will also differ.

ADD REPLY • link 4.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Ok thank you.

If you would know the formule for a vs b for genotype I or for a vs b for genotype II, it would be welcome. This way I can compare how big the difference is when comparing all samples together or a selection of samples only.

⁣Sent from TypeApp

ADD REPLY • link 4.7 years ago by wd • 0

0

Entering edit mode

Well, for the genotype comparison in just controls (or another group), use the 'grouping' variable in your design formula:

~ group

Then:

results(dds, contrast = c('group', 'Control_a', 'Control_b'))

ADD REPLY • link 4.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you, but I was referring to the ~ Treat + geno + Treat:geno design.

How can you compare the two treatments (a vsb) for one genotype using that design.

ADD REPLY • link 4.7 years ago by wd • 0