Question

Limma time-course: how to apply differential contrast if time zero is considered both as treatment and control?

3

Entering edit mode

7.0 years ago

harelarik ▴ 90

Summary: To find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the limma user guide. However it is not trivial, as in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".**

Goal: We are using limma to analyze time course experiment with 3 time points, and one treatment vs. control. We would like to test this combination of time course and one cold stress as follows: Amb_0hr (no stress time zero), Amb_1hr, Amb_6hr, Cld_1hr (cold stress after 1 hour), Cld_6hr.

Experiment description: we started with 5 flasks of bacteria; extracted RNA for time zero from one of them; applied cold stress only on two of them; extracted RNA after 1hr from one stressed and one control flasks; extracted RNA after 6 hours form one stressed and one control flasks (everything was done in triplicates).

Issue": If we want to find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the user guide. While it is straightforward for 6hr cold stress (i.e., makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design), it is not trivial for 1hr cold stress. The reason for that is that, in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".

Considering this, could you please suggest what is the best way to build differential contrast to 1hr hour cold stress?

Potential solutions:

A. For 1 hour cold stress:

I was thinking of using this contrast:

cont.wt <- makeContrasts( "Cld _1hr-Amb_0hr", "Amb_1hr-Amb_0hr",  "Cld _6hr- Cld _1hr", "Amb_6hr-Amb_1hr", levels=design)

And then to subtract the list of over expressed genes found in Amb_1hr-Amb_0hr from the list of genes found by Cld _1hr-Amb_0hr (same is done for under expressed).

Is this a correct approach?

B. For 6 hour cold stress:

I have two options:

a. Using the same contrast used for 1hr cold stress above, and then to subtract the list of over expressed genes found by Amb_6hr-Amb_1hr from the list of over-expressed genes found by Cld _6hr- Cld _1hr.

OR

b. Using the differential contrast based on page 47 of the user guide:

cont.wt <- makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design)

Which one is better for 6 hours?

What is the beat contrast to ask "Which genes respond at either 1 hour or 6 hours of cold stress?

Thank you

Detailed ANALYSIS:

1. Data taken from RNA-seq experiment.

2. Design matrix (two time points :1,6 hr; treatments: control, coldStress, triplicate for each treatment):

Amb_0hr Amb_1hr Amb_6hr Cld_1hr Cld_6hr
1        1       0       0      0      0
2        1       0       0      0      0
3        1       0       0      0      0
4        0       1       0      0      0
5        0       1       0      0      0
6        0       1       0      0      0
7        0       0       1      0      0
8        0       0       1      0      0
9        0       0       1      0      0
10       0       0       0      1      0
11       0       0       0      1      0
12       0       0       0      1      0
13       0       0       0      0      1
14       0       0       0      0      1
15       0       0       0      0      1

3.\ Commands used:

lev<-c("Amb_0hr","Amb_1hr","Amb_6hr","Cld_1hr","Cld_6hr")
all_groups_repeats<-c("Amb_0hr", "Amb_0hr", "Amb_0hr", "Amb_1hr", "Amb_1hr", "Amb_1hr", "Amb_6hr", "Amb_6hr", "Amb_6hr", "Cld_1hr",  " Cld _1hr",  " Cld _1hr",  " Cld _6hr",  " Cld _6hr",  " Cld _6hr")
f <- factor(all_groups_repeats, levels=lev)
design <- model.matrix(~0+f)
colnames(design) <- lev
fit <- lmFit(eset, design)
v <- voom(xCurrent, design, plot=FALSE)#Voom fit
fit <- lmFit(v , design)

Contrast used:

cont.wt <- makeContrasts(
 Dif1_hr =(Cld _1hr-Amb_0hr)-(Amb_1hr-Amb_0hr),
 Dif_6hr=(Cld _6hr-Cld _1hr)-(Amb_6hr-Amb_1hr),
 levels=design)
fit2 <- contrasts.fit(fit, cont.wt)
fit2 <- eBayes(fit2)

Final DE analysis

wt<-decideTests(fit2,p.value = 0.05, adjust.method = "BH",  method = "global")
summary(wt)#Produce summary table

time-course limma • 4.6k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 7.0 years ago by harelarik ▴ 90

Kevin Blighe · Answer 1 · 2017-12-01

1

Entering edit mode

7.0 years ago

Kevin Blighe 88k

Hi harelarik,

From my experience of using limma, it's best to keep the model formula as simple as possible. Well, nothing against limma, really, as an overly complex formula in any regression analysis can produce unreliable information. One of my main worries in your situation is the low sample number.

Edit (October 26, 2019): with a complex formula, it can also be difficult to interpret the meaning of the results; whereas, the biological meaning of results produced from a more basic formula is, well, easier to interpret.

Given the relatively complex set-up and the low sample number, I believe you will find it extremely difficult to arrive at just a single list of statistical values and be able to say: 'This is my solution for 6 hour cold stress', et cetera.

I would begin by critically thinking about what each comparison is actually showing. For example, Cld _6hr-Cld _1hr will show me genes that are statistically significantly differentially expressed between the 1 hour and 6 hour time-points in the cold-stress samples, whereas Amb_6hr-Amb_1hr will show me the same but at ambient / room temperature. If I then compare these two listings (even by eye), I will gain insights into genes that are activated or repressed when cold-stress is applied.

Then do the same for 1 and 0 hours, and 6 and 0 hours. You will end up with 6 different gene listings, each showing various things.

--------------------------------

By the way, in addition and considering that you have bacterial RNA-seq, you may be interested in taking a look at Rockhopper. It runs quickly and will return FASTA sequences that are statistically differentially expressed (and provide P and Q values). You can then infer functionality of these sequences via blastx.

Kevin

ADD COMMENT • link 5.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you very much, Kevin! We have 3 biological repeats per samples so: 21 samples.

Cld _6hr-Cld _1hr will show me genes that are statistically differentially expressed between the 1 hour and 6 hour time-points in the >cold-stress samples, whereas Amb_6hr-Amb_1hr will show me the same but at ambient / room temperature. If I then compare >these two listings (even by eye), I will gain insights into genes that are activated or repressed when cold-stress is applied.

That was my initial thought. But then I saw the "differential contrast" suggested in page 47 of the limma user guide, and I thought it might have some advantage. I.e., is using: Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), better then what you suggested above?? The approach you have suggeted, sounds like the same thing, but I do not get the same number of differentially experssed genes in the final result.

ADD REPLY • link 7.0 years ago by harelarik ▴ 90

0

Entering edit mode

Yes, you will not get the exact same results because the model comparisons are different. The way that I suggested just produces statistics for each pairwise comparison, whereas your model, Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), is attempting to do a further comparison within the model. I just feel that this is too complex for the low numbers of samples that you have, and will produce unreliable results.

The way that I and others with whom I've worked have done these types of studies is through just the simple pairwise comparisons, looking over the results, and then piecing together a storyline based on these.

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

I am after the exact same answer. Given that I would have enough samples, would the contrast mentioned above (Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr)) be correct?

This is an example of my own dataframe:

df <- data.frame(timepoint = rep_len(0:1, length.out=16),
                 subjectID = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
                 group = c(rep("C", 8), rep("N", 8)))

As you can see I have 6 subjects (more in my dataset), each one with two time points measurements. I am interested in knowing if my gene expression is statistically different between the two groups X and Y (in my case either control or treatment)

ADD REPLY • link updated 5.1 years ago by Kevin Blighe 88k • written 5.1 years ago by gaio.transposon • 0

0

Entering edit mode

That formula seems okay for the purposes of gauging differences between Cold and Ambient with respect to time. Please take a look at Aaron's answer here, too: https://support.bioconductor.org/p/117545/#117557

ADD REPLY • link 5.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you Kevin, yes that post is indeed very similar to what I need to do. I have been fiddling with the code further and given that my data is slightly different I decided to post my concerns. Here, if you could please help: Limma/edgeR or other for metagenomic (non-RNA) data?

ADD REPLY • link 5.1 years ago by gaio.transposon • 0

0

Entering edit mode

Very interesting question @harelarik, I'm facing the same problem right now. In case you had enough samples, would it be correct the makeContrasts( Dif_1hr=(Cld _1hr- Amb_0hr)-(Amb_1hr-Amb_0hr), levels=design) approach for checking the genes implicated in the 1 hour cold stress response?

ADD REPLY • link 6.4 years ago by mfalco • 0

0

Entering edit mode

Yes, that will be fine. However, if Amb_0hr is your reference level, then your contrast is the exact same as the following:

makeContrasts(Dif_1hr = Cld _1hr - Amb_1hr, levels=design)

ADD REPLY • link 5.1 years ago by Kevin Blighe 88k