Limma strange low p-values.
1
2
Entering edit mode
8.6 years ago

I'm analysing a microarray (single-channel) study with 2 experimental groups, each one with only 2 samples. I wanted to compare the two groups in order to find DE genes, so, I ran limma analysis on this dataset. The results were weird...lots of genes w/ very low p-values (lowest p-value: 6.144e-213). I made a volcano plot to see what was happening with the data...got this weird volcano plot: http://s33.postimg.org/mxnx39j67/weird_volcano.png

Do you guys know what kind of stuff would lead to that?

>print(annot)
     GEO_ID      group
1 GSM321605 uninfected
2 GSM321606 uninfected
3 GSM321607   infected
4 GSM321608   infected

>design <- model.matrix(~0 +annot[, "group"])
>head(design)
  annot[, "group"]infected annot[, "group"]uninfected
1                        0                          1
2                        0                          1
3                        1                          0
4                        1                          0

>colnames(design) <- c("infected", "uninfected")
>cm <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
>print(cm)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1

>fit <- lmFit(exprs, design)
>fit2 <- contrasts.fit(fit, cm)
>fit2 <- eBayes(fit2)
>results <- topTable(fit2, "InfvsUninf", number=Inf)
>head(results)
   ID        logFC   AveExpr     t        P.Value         adj.P.Val
IDO1        7.496098 8.077363  31.43580 6.144487e-213 8.017326e-209
TNFAIP6     6.945901 7.551685  29.12848 1.359307e-183 8.868116e-180
RSAD2       6.532472 8.066029  27.39471 6.405851e-163 2.786118e-159
IFI27       6.230206 7.727492  26.12712 1.461824e-148 4.768471e-145
IFITM1      6.128964 6.775055  25.70255 6.766299e-144 1.765733e-140
INHBA       5.915550 7.068516  24.80758 2.674626e-134 5.816421e-131
limma microarray R • 5.3k views
ADD COMMENT
1
Entering edit mode

The results do not match with the volcano plot, negative logFC in your results but postive in the Volcanoplot (for those with extreme p-values)

ADD REPLY
0
Entering edit mode

Sorry! I pasted the results of the reverse comparison (UninfvsInf). Thanks for noticing, I'll edit the post!

ADD REPLY
0
Entering edit mode

your t values should also be positive

ADD REPLY
0
Entering edit mode

Oh, of course! Thank you for noticing!

ADD REPLY
0
Entering edit mode

can you post a head of your results? also, how did you code your design matrix etc etc?

ADD REPLY
0
Entering edit mode

Done! Added the code to the original post.

ADD REPLY
0
Entering edit mode

Why is the design matrix (~0 +annot[, "group"]) I think the 0 indicates that you don't want to use an intercept term, but in this design you should really keep a common base between the samples.

ADD REPLY
0
Entering edit mode

It is correct. If limma uses contrasts then it automatically inserts an extra intercept. Its in the code.

ADD REPLY
0
Entering edit mode

Did u fix it? I meet the same problem. could u share some suggestions, please?

ADD REPLY
1
Entering edit mode

This looks like a comment rather than an answer. Please, add comments using the ADD COMMENT button under each post. There is an accepted answer to this questions that both explains the cause of the original problem and provides some solutions. Is this not working for you? If you have a different issue please ask a new question instead. Thank you.

ADD REPLY
0
Entering edit mode

You are right, I moved the post to a comment now.

ADD REPLY
7
Entering edit mode
8.6 years ago
ATRX ★ 1.1k

I think one of the reasons that you are getting weird p-values is because the filtering of the dataset is not proper or the standard error across all the probes in your dataset is large. Limma calculates moderated t-statistic which uses posterior variance rather than sample variance in t-test. Or, in the layman language it is is the ratio of the M-value to its standard error similar to T-statistic. However, in this case the standard errors are moderated across all genes/ probes in your exprs. You may find more about it here. I would recommend you to use all the probes or use varFilter to filter the low expressed probes/genes.

Code:

> dim(exprs)
[1] 54675     4
> design <- model.matrix(~ group + 0)
> colnames(design) <- levels(group)
> head(design)
  uninfected infected
1        1          0
2        1          0
3        0          1
4        0          1
> cont.matrix <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
> print(cont.matrix)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1
> fit1 <- lmFit(exprs, design)
> fit2 <- contrasts.fit(fit1, cont.matrix)
> fit2 <- eBayes(fit2)
Warning message:
Zero sample variances detected, have been offset 
> results <- topTable(fit2, "InfvsUninf", number=nrow(exprs.dat))
> head(results)
                logFC  AveExpr      t      P.Value    adj.P.Val        B
210029_at   7.496098 8.077363 53.69020 5.996081e-09 0.0001206758 9.926007
202411_at   6.230206 7.727492 52.40836 6.882552e-09 0.0001206758 9.869859
201601_x_at 5.956054 7.549371 52.02982 7.173197e-09 0.0001206758 9.852669
222838_at   5.748147 7.996823 49.34098 9.709206e-09 0.0001206758 9.721920
206025_s_at 6.945901 7.551685 48.24558 1.103574e-08 0.0001206758 9.663952
201108_s_at 5.564837 7.035012 45.45633 1.549903e-08 0.0001401571 9.502414

Now, filtering the low variable probes and hence, the value of moderate t-statistic (t column) changes. P-value is dependent upon t value and therefore, it become lower than the previous one.

> library(genefilter)
> exprs = varFilter(exprs)
> dim(exprs)
[1] 27337     4
> design <- model.matrix(~ group + 0)
> colnames(design) <- levels(group)
> head(design)
  uninfected infected
1        1          0
2        1          0
3        0          1
4        0          1
> cont.matrix <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
> print(cont.matrix)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1
> fit1 <- lmFit(exprs, design)
> fit2 <- contrasts.fit(fit1, cont.matrix)
> fit2 <- eBayes(fit2)
> results <- topTable(fit2, "InfvsUninf", number=nrow(exprs.dat))
> head(results)
                logFC  AveExpr      t      P.Value    adj.P.Val         B
210029_at   7.496098 8.077363 33.85290 3.055145e-50 8.351851e-46 103.28349
206025_s_at 6.945901 7.551685 31.34093 1.140235e-47 1.558530e-43  97.62137
213797_at   6.532472 8.066029 29.35202 1.661339e-45 1.513868e-41  92.82696
202411_at   6.230206 7.727492 28.24293 3.021780e-44 2.065160e-40  90.02318
204698_at   6.131048 7.734954 27.51300 2.145762e-43 1.173174e-39  88.12390
214022_s_at 6.128964 6.775055 27.39214 2.980586e-43 1.358005e-39  87.80514

In case you are interested in top ~ 5000 genes/probes only then use varFilter(exprs, var.cutoff = 0.9). Try playing with various different values of var.cutoff to get desired number of genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6