Summary: To find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the limma user guide. However it is not trivial, as in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".**
Goal: We are using limma to analyze time course experiment with 3 time points, and one treatment vs. control. We would like to test this combination of time course and one cold stress as follows: Amb_0hr (no stress time zero), Amb_1hr, Amb_6hr, Cld_1hr (cold stress after 1 hour), Cld_6hr.
Experiment description: we started with 5 flasks of bacteria; extracted RNA for time zero from one of them; applied cold stress only on two of them; extracted RNA after 1hr from one stressed and one control flasks; extracted RNA after 6 hours form one stressed and one control flasks (everything was done in triplicates).
Issue": If we want to find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the user guide. While it is straightforward for 6hr cold stress (i.e., makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design)
, it is not trivial for 1hr cold stress. The reason for that is that, in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".
Considering this, could you please suggest what is the best way to build differential contrast to 1hr hour cold stress?
Potential solutions:
A. For 1 hour cold stress:
I was thinking of using this contrast:
cont.wt <- makeContrasts( "Cld _1hr-Amb_0hr", "Amb_1hr-Amb_0hr", "Cld _6hr- Cld _1hr", "Amb_6hr-Amb_1hr", levels=design)
And then to subtract the list of over expressed genes found in Amb_1hr-Amb_0hr
from the list of genes found by Cld _1hr-Amb_0hr
(same is done for under expressed).
Is this a correct approach?
B. For 6 hour cold stress:
I have two options:
a. Using the same contrast used for 1hr cold stress above, and then to subtract the list of over expressed genes found by Amb_6hr-Amb_1hr
from the list of over-expressed genes found by Cld _6hr- Cld _1hr
.
OR
b. Using the differential contrast based on page 47 of the user guide:
cont.wt <- makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design)
Which one is better for 6 hours?
What is the beat contrast to ask "Which genes respond at either 1 hour or 6 hours of cold stress?
Thank you
Detailed ANALYSIS:
1. Data taken from RNA-seq experiment.
2. Design matrix (two time points :1,6 hr; treatments: control, coldStress, triplicate for each treatment):
Amb_0hr Amb_1hr Amb_6hr Cld_1hr Cld_6hr
1 1 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 0 1 0 0 0
5 0 1 0 0 0
6 0 1 0 0 0
7 0 0 1 0 0
8 0 0 1 0 0
9 0 0 1 0 0
10 0 0 0 1 0
11 0 0 0 1 0
12 0 0 0 1 0
13 0 0 0 0 1
14 0 0 0 0 1
15 0 0 0 0 1
3.\ Commands used:
lev<-c("Amb_0hr","Amb_1hr","Amb_6hr","Cld_1hr","Cld_6hr")
all_groups_repeats<-c("Amb_0hr", "Amb_0hr", "Amb_0hr", "Amb_1hr", "Amb_1hr", "Amb_1hr", "Amb_6hr", "Amb_6hr", "Amb_6hr", "Cld_1hr", " Cld _1hr", " Cld _1hr", " Cld _6hr", " Cld _6hr", " Cld _6hr")
f <- factor(all_groups_repeats, levels=lev)
design <- model.matrix(~0+f)
colnames(design) <- lev
fit <- lmFit(eset, design)
v <- voom(xCurrent, design, plot=FALSE)#Voom fit
fit <- lmFit(v , design)
Contrast used:
cont.wt <- makeContrasts(
Dif1_hr =(Cld _1hr-Amb_0hr)-(Amb_1hr-Amb_0hr),
Dif_6hr=(Cld _6hr-Cld _1hr)-(Amb_6hr-Amb_1hr),
levels=design)
fit2 <- contrasts.fit(fit, cont.wt)
fit2 <- eBayes(fit2)
Final DE analysis
wt<-decideTests(fit2,p.value = 0.05, adjust.method = "BH", method = "global")
summary(wt)#Produce summary table
Thank you very much, Kevin! We have 3 biological repeats per samples so: 21 samples.
That was my initial thought. But then I saw the "differential contrast" suggested in page 47 of the limma user guide, and I thought it might have some advantage. I.e., is using: Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), better then what you suggested above?? The approach you have suggeted, sounds like the same thing, but I do not get the same number of differentially experssed genes in the final result.
Yes, you will not get the exact same results because the model comparisons are different. The way that I suggested just produces statistics for each pairwise comparison, whereas your model,
Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr)
, is attempting to do a further comparison within the model. I just feel that this is too complex for the low numbers of samples that you have, and will produce unreliable results.The way that I and others with whom I've worked have done these types of studies is through just the simple pairwise comparisons, looking over the results, and then piecing together a storyline based on these.
Hi Kevin,
I am after the exact same answer. Given that I would have enough samples, would the contrast mentioned above
(Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr))
be correct?This is an example of my own dataframe:
As you can see I have 6 subjects (more in my dataset), each one with two time points measurements. I am interested in knowing if my gene expression is statistically different between the two groups X and Y (in my case either control or treatment)
That formula seems okay for the purposes of gauging differences between Cold and Ambient with respect to time. Please take a look at Aaron's answer here, too: https://support.bioconductor.org/p/117545/#117557
Thank you Kevin, yes that post is indeed very similar to what I need to do. I have been fiddling with the code further and given that my data is slightly different I decided to post my concerns. Here, if you could please help: Limma/edgeR or other for metagenomic (non-RNA) data?
Very interesting question @harelarik, I'm facing the same problem right now. In case you had enough samples, would it be correct the makeContrasts( Dif_1hr=(Cld _1hr- Amb_0hr)-(Amb_1hr-Amb_0hr), levels=design) approach for checking the genes implicated in the 1 hour cold stress response?
Yes, that will be fine. However, if
Amb_0hr
is your reference level, then your contrast is the exact same as the following: