EdgeR model matrix for 2 plant genotypes with 2 treatments and 2 timepoints
1
0
Entering edit mode
3.9 years ago
User 4014 ▴ 40

Hi folks,

I am new in field of RNA-Seq and now trying to call DEGs from 2 plant genotypes (A -heat resistant and B-heat unresistant). I have 2 treatments (control vs treatment with 3 replicates for each, except one control I have only 2 replicates) and 2 timepoints (6 days and 12 days). Basically I would like to:

  1. Compare within and across timepoints for each genotype, e.g. A_treatment_day6 - A_control_day6 and (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6)

  2. Compare within and across timepoints across genotypes (e.g. (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6)).

I am very confused whether it is better to build one matrix with genotypes, timepoints and treatment (~0+genotype+timepoint+treatment) and call DEGs from it, or it is better to define each treatment combination as a group (~0 + group; as in the section 3.3.1 of edgeR manual). Since the transcriptional responses can be quite different at 6 and 12 days, another thought I have is that perhaps it is better to work on each timepoint separately and treat each treatment combination as a group? But how can I avoid creating the (~0+genotype+timepoint+treatment) matrix for (2)?

May I share your thoughts what would be the best approach for this analysis? I read through edgeR manual a few times, but with my limited statistical knowledge I am not sure I understand it by heart. I greatly appreciate all suggestions and guidance. Thank you very much and I am looking forward to hearing from you.

RNA-Seq R • 1.2k views
ADD COMMENT
2
Entering edit mode
3.9 years ago

A_treatment_day6 - A_control_day6 for this simple case it would be best to concatenate all of your factor levels into one term, such as A_treatment_day6. For the example we'll call this new factor level genotype_treatment_timepoint. Your formula would be ~ genotype_treatment_timepoint.

(A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6) and permutations of this in your question are answered by an interaction term, so for this concatenate your genotype and treatment into one factor (such as A_treatment, A_control) that we will call genotype_treatment, and then your formula would be ~ genotype_treatment + timepoints + genotype_treatment:timepoints, which the desired contrast being the interaction term genotype_treatment:timepoints.

ADD COMMENT
0
Entering edit mode

Hi, Thank you very much for your help. It is very useful. I tried making a code for (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6). Do you mind taking a look at it and let me know how to improve it please?

`rawCountTable <- read.table("counts.txt", header=TRUE, sep="\t", row.names=1)
sampleInfo4 <- read.table("sampleinfo4.csv", header=TRUE, sep=",", row.names=1)
genotype_treatment <- factor(sampleInfo4$genotype_treatment) 
genotype_treatment <- relevel(genotype_treatment, ref="A_control") 
timepoints <- factor(sampleInfo4$timepoints)
timepoints <- relevel(timepoints,ref="6")
Group <- factor(paste(sampleInfo4$genotype_treatment,sampleInfo4$timepoints,sep="."))
design <- model.matrix(~genotype_treatment * timepoints, data=y$samples)   # There are 4 coefficients here: [1] "(Intercept)", 
                [2] "genotype_treatmentA_treatment", [3] "timepoints9", [4] "genotype_treatmentA_treatment:timepoints9"
y <- DGEList(counts=rawCountTable[,1:11], group=Group)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method='TMM')
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit, coef=4)`

Still I have to figure out how to analyze (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6), but I guess I have to use each treatment combination as a group.

Thank you very much and have a great day!

ADD REPLY
0
Entering edit mode

(A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6) would be another interaction term. Combine treatment and timepoint into one factor level (I'll call treatment_timepoint), and your formula would be ~ treatment_timepoint + genotype + treatment_timepoint:genotype. The coefficient of interest would again be your interaction term treatment_timepoint:genotype.

As for your code above I tend to use DESeq2 more, but as long as you selected the correct coefficient that code looks good otherwise.

ADD REPLY

Login before adding your answer.

Traffic: 2753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6