Dear BioStar Users,
I still don't understand how limma works. When I read the manual, it seems cristal clear, but again and again, with my data, i get lost.
I have a microarray dataset (on agilent platform) from a cancer study. 76 Clinical samples have been hybridized, and are distributed in 3 groups : A = Healthy, B = pre-cancerous, C = Cancer
table(gr)
gr
A B C
17 34 25
I usually use SAM algorithm, because i understand what it does : It takes one gene after the other, compare the expression in two or n groups, then correct for multitesting (if I may summarize it like that).
With SAM approach, I get :
sam.gr = sam(eset,cl=gr,B=1000,rand=123,gene.names=names$SYMBOL)
printsam.gr,seq(0.5,6,0.5))
SAM Analysis for the Multi-Class Case with 3 Classes
Delta p0 False Called FDR
1 0.5 0.204 17079.988 28092 0.124061
2 1.0 0.204 9259.003 24936 0.075765
3 1.5 0.204 5101.050 22387 0.046494
4 2.0 0.204 2826.544 20110 0.028680
5 2.5 0.204 1591.944 18236 0.017813
6 3.0 0.204 905.447 16618 0.011118
7 3.5 0.204 521.075 15252 0.006971
8 4.0 0.204 303.668 14119 0.004389
9 4.5 0.204 179.219 13156 0.002780
10 5.0 0.204 107.174 12314 0.001776
11 5.5 0.204 64.225 11519 0.001138
12 6.0 0.204 38.668 10778 0.000732
So I'll take a delta = 3.5
:
ssam.gr = summarysam.gr,3.5)
Then I compute the fold changes that I want to look at :
FC.CvsA = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) -
apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)
FC.BvsA = apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T) -
apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)
FC.CvsB = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) -
apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T)
Then I mix both to get a list of differentially expressed genes :
i = ssam.gr@row.sig.genes
ind = ssam.gr@row.sig.genes[ which( abs(FC.CvsA[i]) > 1 | abs(FC.BvsA[i]) > 1 | abs(FC.CvsB[i]) > 1 ) ]
length(ind)
[1] 5877
Great, about 6000 genes differentially expressed.
If I try to do the same with limma, i would wrote (the mistake is probably here, but where and why ?) :
design = model.matrix(~gr)
colnames(design)
[1] "(Intercept)" "grB" "grC"
fit = lmFit(eset,design)
fit2 = eBayes(fit)
res = decideTests(fit2,p.value=0.001,lfc=log2(2))
ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 31845
You're thinking I did not read the manual carefully, so I also tried :
design = model.matrix(~0+gr)
colnames(design) = c("A","B","C")
fit = lmFit(eset,design)
cont.matrix = makeContrasts(
CvsA = C-A,
BvsA = B-A,
CvsB = C-B,
levels=design)
fit2 = contrasts.fit(fit,cont.matrix)
fit2 = eBayes(fit2)
res = decideTests(fit2,p.value=0.001,lfc=log2(2))
ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 5110
Great !! BUT WHY ? I'm sure it is obvious, but i don't understand the difference between the two approach [~gr] and [~0+gr and contrasts]
Thanks for your time,
Julien