How to approach DESeq2 analysis on 2 plant lines infected with virus, with 2 time points (days post infection)
1
0
Entering edit mode
9 weeks ago

Hi there. I am currently trying to use DESeq2 analysis on some samples. I have completed a basic analysis, but now I have a much more complicated analysis I need to do. I am looking at a lot of videos and examples, but I still don't quite understand how to write the R-script/code for what it is I want to do. I will try and give a detailed layout of my study.

I am studying the effect of a tomato virus on 2 plant lines, Resistant and susceptible. For each plant line I have 2 groups, infected and control, and I have 2 time points, 15 days post infection and 35 days post infection (dpi), and this also counts for the control group (they were inoculated with an empty plasmid to account for biological reactions to plant damage as well, to keep as many factors constant).

Now, I want to see how the genes were differentially expressed:

  • for each time point of each plant line (15dpi vs control and so on)
  • between the time points within each plant line (15dpi vs 35 dpi)
  • between the two plant lines (15dpi resistant vs 15dpi susceptible and 35dpi resistant vs 35dpi susceptible)

Now I know that this means I am looking at different levels. But now my question is, do I use the contrast() or how do I approach this.

Here is my sample information: (3 biological replicates for control, 4 biological replicates for infected)

Resistant
Sample  Treatment   Time point  Plant Line
RVR1:15 Infected    15  Resistant
RVR2:15 Infected    15  Resistant
RVR3:15 Infected    15  Resistant
RVR4:15 Infected    15  Resistant
RVR1:35 Infected    35  Resistant
RVR2:35 Infected    35  Resistant
RVR3:35 Infected    35  Resistant
RVR4:35 Infected    35  Resistant
RCR1:15 Control 15  Resistant
RCR2:15 Control 15  Resistant
RCR3:15 Control 15  Resistant
RCR1:35 Control 35  Resistant
RCR2:35 Control 35  Resistant
RCR3:35 Control 35  Resistant

Susceptible
Sample  Treatment   Time point  Plant Line
SVR1:15 Infected    15  Susceptible
SVR2:15 Infected    15  Susceptible
SVR3:15 Infected    15  Susceptible
SVR4:15 Infected    15  Susceptible
SVR1:35 Infected    35  Susceptible
SVR2:35 Infected    35  Susceptible
SVR3:35 Infected    35  Susceptible
SVR4:35 Infected    35  Susceptible
SCR1:15 Control 15  Susceptible
SCR2:15 Control 15  Susceptible
SCR3:15 Control 15  Susceptible
SCR1:35 Control 35  Susceptible
SCR2:35 Control 35  Susceptible
SCR3:35 Control 35  Susceptible

So how will I write the multifactor design for these groups/levels? I would really appreciate assistance on this, I am so lost and confused. Thank you.

multifactor DESeq2 • 861 views
ADD COMMENT
0
Entering edit mode
9 weeks ago

To compare one subgroup of samples to another, make one column in colData with all the elements concatenated together.

Then you can contrast, say control_resistant_5 versus infected_resistant_15

ADD COMMENT
0
Entering edit mode

Thank you for the response. Would it look something like this?

Sample  Condition
1   resistant_infected_15
2   resistant_infected_15
3   resistant_infected_15
4   resistant_infected_15
5   resistant_conrol_15
6   resistant_conrol_15
7   resistant_conrol_15
8   resistant_infected_35
9   resistant_infected_35
10  resistant_infected_35
11  resistant_infected_35
12  resistant_control_35
13  resistant_control_35
14  resistant_control_35
15  susceptible_infected_15
16  susceptible_infected_15
17  susceptible_infected_15
18  susceptible_infected_15
19  susceptible_control_15
20  susceptible_control_15
21  susceptible_control_15
22  susceptible_infected_35
23  susceptible_infected_35
24  susceptible_infected_35
25  susceptible_infected_35
26  susceptible_control_35
27  susceptible_control_35
28  susceptible_control_35

Do I need to add batch information?

ADD REPLY
0
Entering edit mode

Yes, like that. If you have batch information, and it doesn't confound with your experimental groups, you can just add it as a column, and make your design ~ Batch + Condition. (Order doesn't matter when you specify the desired contrasts in the results call.)

There are other ways to compare one subgroup to another, using designs with interactions, but this way is pretty much equivalent, and way, way more readable.

ADD REPLY
0
Entering edit mode

Thank you for the feedback. I appreciate the assistance. One final question, do I have to label each replicate with a sub. number, i.e susceptible_control_35_A? Or will DESeq2 recognize the biological replicates?

ADD REPLY
0
Entering edit mode

Did you look at the DESeq vignette? Do they include replicate numbers in their colData? DESeq will take all the samples with the group name you specify, and compare them to all the samples with the second group name you specify. If you include replicate numbers in there, every sample will have its own group. You don't want that.

ADD REPLY
0
Entering edit mode

The titles are the same as the second table I sent through (susceptible_control_35). So I should leave them as they are. How do you specify which group goes first ad which second? I am sorry if that's a stupid question, but I am still trying to figure this all out

ADD REPLY
0
Entering edit mode

Did you look at the vignette?

ADD REPLY
0
Entering edit mode

I am doing it by myself, I am setting up the DESeq2. I have tried reading the manual for DESeq2 and looking at examples. The sample data and counts data are what I put together. I hope that makes sense.

ADD REPLY
0
Entering edit mode

This is what I have for resistant 15dp infected vs resistant 15dpi control

#Load libraries

library(DESeq2)
library(pheatmap)
library(dplyr)
library(RColorBrewer)
library(ggplot2)
library(ggrepel)
library(tidyr)
install.packages("ggrepel")
install.packages("tidyverse")

#Set the working directory
setwd('~/FeatureCounts/NCBI_data/Resistant/DESeq2_Oct')

#Load the count data 
count_data <- read.csv("Resistant_data_15_ncbi.txt",header=TRUE,row.names=1)
colnames(count_data)
head(count_data)

#Load the sample information
col_data <- read.csv("Sample_info_Resistant_15.txt",header=TRUE,row.names=1)
colnames(col_data)
head(col_data)

#drop NA's
count_data <- count_data %>%
  drop_na()

#Create a deseq object 
dds <-DESeqDataSetFromMatrix(countData = count_data,colData = col_data, design = ~ condition)

dds

# pre-filtering: removing rows with low gene counts
# keeping rows that have at least 10 reads total
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds

# set the factor level
dds$condition <- relevel(dds$condition, ref = "control")

# Step 3: Run DESeq 
dds <- DESeq(dds)
res <- results(dds)

res

#explore results 
summary(res)

#adjust
res0.05 <- results(dds, alpha = 0.05)
summary(res0.05)


#Change DESeq object to dataframe
res0.05 <- as.data.frame(res0.05)
class(res0.05)

head(res0.05)

What is really confusing is the fact that each time point has its own control. So if I wanted to compare resistant 15dpi infected with susceptible 15dpi infected, which control will I use, the resistant control or the susceptible control. I don't think it's possible then to compare the resistant with the susceptible.

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6