Question

can anybody help me to check if the code is correct to use limma package to find the significantly expressed genes

1

Entering edit mode

10.4 years ago

sw.arker ▴ 70

I am rookie to using limma package. My data looks like this in excel: first column is the gene name, first raw is sample name, and under each sample name is the gene expression value (4 replicates per sample, I have sample wt and mu). I want to find out the significant expressed genes between mu vs wt.

and I write code like this:

sample=read.csv("sample.csv",header=T,row.names=1)
logsample=log2(sample)
design=model.matrix(~0+c(rep('wt',4),rep('mu',4)))
colnames(design)=c("wt","mu")
cm=makeContrasts(mu-wt,levels=design)
fit=lmFit(logsample,design)
fit2=contrasts.fit(fit,cm)
fit3=eBayes(fit2)
result=topTable(fit3,number=Inf,adjust="BH",sort.by="none")

I am not sure I did it correctly. Please help me to check, many thanks!! I log2 transformed the intensity. The data is already normalized by other software; I am not sure I should use makeContrasts or not.

Thanks!!

R • 3.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by sw.arker ▴ 70

2

Entering edit mode

What exactly does sample.csv contain?

BTW, if you don't remove the intercept from the design then you can skip looking at the contrast.

ADD REPLY • link 10.4 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, thanks for reply, the sample.csv looks like this:

probe      wt1          wt2          wt3          wt4          mu1          mu2          mu3          mu4
gene1      1136.704     4123.447     5138.65      5103.593     3919.762     3094.666     4378.834     4429.387
gene2      1253.719     2591.891     2611.626     2790.135     2205.512     1898.726     2374.038     2539.176
gene3      2227.929     1968.311     2303.308     1817.361     2323.718     1060.57      994.9664     1803.954
gene4      572.5932     1043.077     1479.159     1283.932     820.1443     1088.151     1139.485     1082.607
gene5      76.20247     42.47991     8.524587     24.1916      30.72517     11.20368     8.393809     42.14227
gene6      2672.205     5504.846     7236.014     7006.669     6271.228     5925.029     6590.307     6865.492
gene7      324.3387     276.1173     216.5971     287.3867     224.2262     249.4994     199.8653     301.3705
gene8      3554.573     3388.302     3009.981     3454.752     2837.759     2854.741     2503.966     2837.336
gene9      2102.255     1130.993     1789.667     1696.07      1172.711     1385.945     1913.929     1866.4
gene10     2967.336     4161.204     4074.815     4325.702     3897.27      3047.727     3801.668     3924.415
gene11     143.7498     109.3417     81.79682     106.7067     105.4525     90.31444     127.8393     122.719
gene12     2672.205     5504.846     7236.014     7006.669     6271.228     5925.029     6590.307     6865.492
gene13     572.5932     1043.077     1479.159     1283.932     820.1443     1088.151     1139.485     1082.607
gene14     143.7498     109.3417     81.79682     106.7067     105.4525     90.31444     127.8393     122.719
gene15     143.7498     109.3417     81.79682     106.7067     105.4525     90.31444     127.8393     122.719
gene16     455.3064     318.3769     235.8836     349.7169     301.8518     301.0153     242.6692     338.5451
gene17     2913.168     3702.693     3544.199     3899.656     3689.94      2747.714     3436.402     3536.472
gene18     143.7498     109.3417     81.79682     106.7067     105.4525     90.31444     127.8393     122.719
gene19     1761.665     5428.883     6678.637     6753.888     5176.198     4133.363     5760.567     6108.301
gene20     572.5932     1043.077     1479.159     1283.932     820.1443     1088.151     1139.485     1082.607

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by sw.arker ▴ 70

Ram · Accepted Answer · 2014-07-10

Your code looks good to me to find genes differentially expressed between the 4 WT and 4 mu, which I guess it is what you want.

As Devon Ryan suggested, you can leave the intercept and skip the contrasts. This should be equivalent to your code:

design=model.matrix(~1+c(rep('wt',4),rep('mu',4)))
colnames(design)=c("wt","mu")
fit=lmFit(logsample,design)
fit<- eBayes(fit)
results<- topTable(fit,number=Inf,adjust="BH",sort.by="none", coef= 2)

However, I don't see any harm in explicitly fitting contrasts, actually I find it clearer.

I think in general it's good to look at a couple of genes from the output of topTable() and make sure the model is doing what you expect by comparing to the values in the normalized matrix. E.g. in your case:

results$logFC[1]

Should be equal to (given rounding errors):

mean(logsample[1, 1:4]) - mean(logsample[1, 5:8])