Hi all,
I am doing a timcourse analysis, and as I read my way through the BioC mailing list and biostars, I read that limma was not recommended for this task and that timecourse would be a better fit.
The data: I have two conditions, control and disease, and six time points and 3 replicates at each time point. So 36 samples all together.
The thing is, I started using limma, because my 2 biological questions are:
1) Which genes do change at each time point between control and sample? 2) Which genes change over time between control and sample?
In limma, with the contrasts this can be done nicely, but apparently when the data contains replicates (ie is not independent) it is better to use timecourse as limma assumes independence. ( I hope I understood this correctly). Is this correct?
I have analyzed the data now with the timecourse package and am able to do the profile plots, which show me one gene and how it changes over time and between conditions. This is good, however, I want to ask my two biological questions, and this is what I don't know how to do.
How do I get the genes that change per time point and the genes that change over time? What Im looking for is something like decideTests that reports a p-value and a fold change or a f-test p-value. The output object somehow does not provide this.
Any help with this would be highly appreciated.
Thanks!
Ninni
timecourse code:
library(gtools)
library(timecourse)
col_orders = c( paste0("C", c(4,6,8,14,20,24), rep("_1",6)),
paste0("C", c(4,6,8,14,20,24), rep("_2",6)),
paste0("C", c(4,6,8,14,20,24), rep("_3",6)),
paste0("N", c(4,6,8,14,20,24), rep("_1",6)),
paste0("N", c(4,6,8,14,20,24), rep("_2",6)),
paste0("N", c(4,6,8,14,20,24), rep("_3",6))
)
> col_orders
[1] "C4_1" "C6_1" "C8_1" "C14_1" "C20_1" "C24_1" "C4_2" "C6_2" "C8_2"
[10] "C14_2" "C20_2" "C24_2" "C4_3" "C6_3" "C8_3" "C14_3" "C20_3" "C24_3"
[19] "N4_1" "N6_1" "N8_1" "N14_1" "N20_1" "N24_1" "N4_2" "N6_2" "N8_2"
[28] "N14_2" "N20_2" "N24_2" "N4_3" "N6_3" "N8_3" "N14_3" "N20_3" "N24_3"
idx = match(col_orders, mixedsort(colnames(exprs)))
exprs_sorted = exprs[,idx]
condition_group = rep(c("C", "N"), each= 18)
> condition_group
[1] "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "N"
[20] "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
time_group = rep(c(4,6,8,14,20,24), 6)
> time_group
[1] 4 6 8 14 20 24 4 6 8 14 20 24 4 6 8 14 20 24 4 6 8 14 20 24 4
[26] 6 8 14 20 24 4 6 8 14 20 24
rep_group = rep(rep(c("rep1","rep2","rep3"),each = 6),2)
> rep_group
[1] "rep1" "rep1" "rep1" "rep1" "rep1" "rep1" "rep2" "rep2" "rep2" "rep2"
[11] "rep2" "rep2" "rep3" "rep3" "rep3" "rep3" "rep3" "rep3" "rep1" "rep1"
[21] "rep1" "rep1" "rep1" "rep1" "rep2" "rep2" "rep2" "rep2" "rep2" "rep2"
[31] "rep3" "rep3" "rep3" "rep3" "rep3" "rep3"
size = matrix(3, nrow = 19155, ncol = 2)
> head(size)
[,1] [,2]
[1,] 3 3
[2,] 3 3
[3,] 3 3
[4,] 3 3
[5,] 3 3
[6,] 3 3
result_paired = mb.long(exprs_sorted, times = 6, method = "paired", reps = size, condition.grp = condition_group, time.grp = time_group, rep.grp = rep_group)
str(result_paired)
Formal class 'MArrayTC' [package "timecourse"] with 1 slots
..@ .Data:List of 12
.. ..$ : num [1:19155, 1:36] 8.65 6.21 6.83 7.78 14.76 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:19155] "NM_001106128" "NM_012306304" "NM_006786375" "NM_045107254" ...
.. .. .. ..$ : chr [1:36] "C14_1" "C20_1" "C24_1" "C4_1" ...
.. ..$ : num 0.02
.. ..$ : num 3.56
.. ..$ : num [1:6, 1:6] 0.047875 0.002305 -0.000541 0.0011 0.007919 ...
.. ..$ : num [1, 1:2] 3 64
.. ..$ : num [1:19155, 1:2] 3 3 3 3 3 3 3 3 3 3 ...
.. ..$ : chr [1:36] "C" "C" "C" "C" ...
.. ..$ : chr [1:36] "rep1" "rep1" "rep1" "rep1" ...
.. ..$ : num [1:36] 4 6 8 14 20 24 4 6 8 14 ...
.. ..$ : num [1:19155] 67.36 134.78 10.47 6.54 47.69 ...
.. ..$ : int [1:19155] 15889 4317 16312 16887 57 18306 5775 12806 5490 18608 ...
.. ..$ : chr "C"
gene_positions = result_paired$pos.HotellingT2 [1:100]
gnames = rownames(exprs_sorted)
gene_probes = gnames[gene_positions]
plotProfile(result_paired, type="b", ranking = 2, gnames = gnames)
I'm not sure what you mean by your replicates not being independent. Could you expound a bit on what you mean by this? In general, limma deals with timecourse experiments quite well.
Hi! I am referring to this post here: http://grokbase.com/p/r/bioconductor/117kp4c095/bioc-time-series-analysis-with-limma-package My data is from different mice, but each time point per replicate is taken from the same mouse. So, time point 1 replicate 1 is from the same mouse as time point 2 from replicate 1, just later. My understanding is that this data is dependent. That is why I used the time course package instead. Is this wrong? Could I just use limma?
You can still do that with limma. Have a look at Gordon's reply in the same thread.
Hi Oh I didnt see this! Thank you. So you say I should just use duplicateCorrelation() and then feed that to lmFit? How would the input look like?
I will try to do this, maybe you could let me know, if this is correct?
and then the contrasts for my two biological questions accordingly (not shown now)
Thanks so much! Ninni