Question

DESeq2 analysis

1

Entering edit mode

3 months ago

rajdeepboral00 ▴ 60

Hello all, Overview of my aim - I have done RNA sequencing of 8mice , 4 control and 4 infected. I then divided the spinal cord of those 8 mice into cervical,thoracic and lumbar regions. So , now i have total of 24 samples. CC1,2,3,4 are the control cervical ans so on with CT,CL,IC,IT and IL. Problem- I want to check how the infection is differing region wise of the spinal cord. For that i used the DESeq R package. The codes are

samples <- c("CC1", "CC2", "CC3", "CC4", "CT1", "CT2", "CT3", "CT4", "CL1", "CL2", "CL3", "CL4",
             "IC1", "IC2", "IC3", "IC4", "IT1", "IT2", "IT3", "IT4", "IL1", "IL2", "IL3", "IL4")

region <- rep(c("Cervical", "Thoracic", "Lumbar"), each = 4, times = 2)

infection_status <- rep(c("Control", "Infected"), each = 12)

metadata <- data.frame(Sample = samples,
                       Region = factor(region, levels = c("Cervical", "Thoracic", "Lumbar")),
                       Infection_Status = factor(infection_status, levels = c("Control", "Infected")),
                       row.names = samples)

counts <- read.csv("counts.csv", row.names = 1)  

dds <- DESeqDataSetFromMatrix(countData = counts, 
                              colData = metadata, 
                              design = ~ Region + Infection_Status+Region:Infection_Status)  
dds <- DESeq(dds)

res <- results(dds)

Doubts- Should i include the interaction term in the design?The result without or with using the interaction term is same. How can that be? Am i doing this correctly?Next i want to check for the DEGs between CC and IC, CT and IT , CL and IL , IT and IC. So should i run the dds command taking the whole data or should i subset it first to my specific interest and then run the dds?

I know these are lots of questions, but i feel really puzzled and there are so many informations everywhere. Kindly answer my doubts if possible.

DESeq2 RNASeq interaction • 863 views

ADD COMMENT • link updated 3 months ago by Santosh Anand 5.8k • written 3 months ago by rajdeepboral00 ▴ 60

3

Entering edit mode

3 months ago

Santosh Anand 5.8k

Too many questions, indeed:)

Q1: Should i include the interaction term in the design?

That really depends if you expect an interaction between the two terms. In DESeq2, interaction terms in the design formula are useful when you want to test whether the effect of one variable depends on the level of another variable. You should use an interaction term when you suspect that the relationship between a factor and gene expression is different across levels of another factor.

Example: You have two conditions (e.g., control vs. treatment) and two genotypes (e.g., wild-type vs. mutant). You want to check whether the treatment effect is different in the wild-type compared to the mutant.

If you're only interested in whether condition A vs. B changes gene expression on average, an interaction term is unnecessary.

Interaction terms require more statistical power. If you have a small sample size, adding an interaction might lead to unreliable estimates or overfitting.

Since the presence of an interaction is not always known beforehand, there is a way in DESeq2 to verify whether the interaction term is necessary.

Specify the full and reduced models, then run DESeq2 using the Likelihood Ratio Test (LRT) instead of the default Wald test.

# Full model w/ interaction term
dds <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = ~ genotype + condition + genotype:condition) 

# reduced model (w/o interaction term)
dds <- DESeq(dds, test="LRT", reduced = ~ genotype + condition)

Now, get the results to check whether the interaction term significantly improves model fit.

res <- results(dds)

If the interaction term is significant, this means that the effect of treatment depends on genotype, and you should keep the intersction term and use the full model. If not significant, you can simplify your model and drop the interaction term

Interpret the LRT Results:

Significant p-value (e.g., p < 0.05) from the LRT: This indicates that the interaction term is statistically significant. This means the effect of treatment does depend on genotype. In this case, the full model is more appropriate. You should use the results from the full model to understand how treatment affects gene expression within each genotype. The main effects from the reduced model are less meaningful here.

Large p-value (e.g., >= 0.05) from the LRT: This suggests that the interaction term is not statistically significant. The effect of treatment is likely similar across genotypes. In this case, the reduced model is more appropriate. It's simpler and easier to interpret. You can look at the main effects of genotype and treatment directly.

Q2: The result without or with using the interaction term is same. How can that be? Am i doing this correctly?

By default, the results() in DESeq2 selects the last variable in the design formula and compares its last level against its first level. According to the "results" manual (?results):

"If results is run without specifying contrast or name, it will return the comparison of the last level of the last variable in the design formula over the first level of that variable. For a simple two-group comparison, this means it will return the log2 fold changes of the second group relative to the first (reference) group."

If you run results(dds) without specifying any arguments, it will automatically display the results for the last contrast (comparison) listed in resultsNames(dds), which includes all possible contrasts (comparisons). If unsure, you can always execute results(dds) and check the first line of the output, where the actual contrast being used will be displayed, similar to this:

log2 fold change (MLE): genotypeII.conditionB

Please refer to the examples from the DESeq2 manual (?DESeq) for proper usage and interpretation of all the contrasts. For convenience, I have included Example 2, which include an interaction term, verbatim below.

## Example 2: two conditions, two genotypes, with an interaction term

dds <- makeExampleDESeqDataSet(n=100,m=12)
dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

design(dds) <- ~ genotype + condition + genotype:condition
dds <- DESeq(dds) 
resultsNames(dds)

# the condition effect for genotype I (the main effect)
results(dds, contrast=c("condition","B","A"))

# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

# the interaction term, answering: is the condition effect *different* across genotypes?
results(dds, name="genotypeII.conditionB")

design(dds) <- ~ genotype + condition + genotype:condition
dds <- DESeq(dds) 
resultsNames(dds)

Q3: So should i run the dds command taking the whole data or should i subset it first to my specific interest and then run the dds?

The official recomendation is to run all of them at once with whole data. See If I have multiple groups, should I run all together or split into pairs of groups?

.."Typically, we recommend users to run samples from all groups together, and then use the contrast argument of the results function to extract comparisons of interest after fitting the model using DESeq."

However, if you have multiple groups with significant differences in inter-group variability, it is better to subset the data. This approach provides greater statistical power and yields more differentially expressed genes.

HTH!

ADD COMMENT • link 3 months ago by Santosh Anand 5.8k

1

Entering edit mode

Thanks alot for the reply...It made things much clearer to me.

ADD REPLY • link 3 months ago by rajdeepboral00 ▴ 60

score 2 · Accepted Answer · 2025-01-31

2

Entering edit mode

3 months ago

i.sudbery 21k

Should i include the interaction term in the design?

This depends on whether you are interested in the interaction terms or not. Including the interaction terms consumes statistical power, but it is the only way to ask certain questions. The question asked by the various interaction terms is "Is the effect of infection different between the different areas?" That is for example "Does infection have more of an effect in Cervical than Thoratic?". If you are only interested in the comparisons between infected and control in each region seperately, then there is no reason to include the interaction term.

The result without or with using the interaction term is same.

This is surprising, although I don't know which contast DESeq would default to where you are not specifying a contrast in your results call

So should i run the dds command taking the whole data or should i subset it first to my specific interest and then run the dds?

There are pros and cons to each approach. Both are valid, and I doubt the results will be very different.

ADD COMMENT • link 3 months ago by i.sudbery 21k

0

Entering edit mode

This is surprising, although I don't know which contast DESeq would default to where you are not specifying a contrast in your results call

According to the DESeq2 manual : "If results is run without specifying contrast or name, it will return the comparison of the last level of the last variable in the design formula over the first level of that variable. For a simple two-group comparison, this means it will return the log2 fold changes of the second group relative to the first (reference) group."