Question

How to use FDR data?

0

Entering edit mode

8.3 years ago

sb • 0

What is FDR in Microarray analysis ? How to use FDR data from class comparison of BRB arraytool?strong text

RNA-Seq BRBArray Tool Microarray analysis • 2.3k views

ADD COMMENT • link updated 8.3 years ago by dariober 15k • written 8.3 years ago by sb • 0

2

Entering edit mode

FDR means false discovery rate. Its computation is a way of controlling the risk of false positives in a multiple testing situation. See here for more explanations.

ADD REPLY • link 8.3 years ago by Jean-Karim Heriche 27k

score 2 · Answer 1 · 2016-07-15

Not really answering the question, but I find this simple simulation quite instructive in telling why pvalues need to be corrected by FDR.

Let's simulate a dataset of 10k rows (say genes) and 6 columns (arrays) and assign the first three columns (arrays) to condition A and the second three to condition B. Data is pure random noise from a normal distribution:

set.seed(1234)
dat<- matrix(rnorm(n= 60000), nrow= 10000)
colnames(dat)<- c('cond_A', 'cond_A', 'cond_A', 'cond_B', 'cond_B', 'cond_B')

dat[1:10,]
          cond_A      cond_A     cond_A     cond_B       cond_B      cond_B
 [1,] -1.2070657 -1.81689753 -1.6878627  2.4918186 -0.903147902  0.49060054
 [2,]  0.2774292  0.62716684 -0.9552011  0.0532215 -0.006098308  0.02499143
 [3,]  1.0844412  0.51809210 -0.6480572  0.4562491 -0.904131937  1.29905349
 [4,] -2.3456977  0.14092183  0.2610342  1.5770552 -0.060453158 -0.23457321
 [5,]  0.4291247  1.45727195 -1.2196940  0.6223530 -1.094187464 -0.45257621
 [6,]  0.5060559 -0.49359652 -1.5501888  1.1879753  0.352918538 -0.01112573
 [7,] -0.5747400 -2.12224406  0.7750572 -0.2801802  0.030408452  0.93259094
 [8,] -0.5466319 -0.13356660  1.7581137 -1.3515010 -1.403397835 -0.72902894
 [9,] -0.5644520 -0.42760035  1.4179980 -0.2894252  2.525432355  1.39921972
[10,] -0.8900378  0.08779481 -1.2691443 -1.1788329 -1.281886211 -0.05842774
...

Now, let's apply a t.test to each row to see which genes are different between condition A and B:

pvals<- apply(dat, 1, function(x) {t.test(x[1:3], x[4:6])$p.value})

Let's sort the pvalues small to large. Although the data is random, we get some pretty small p-values which are obviously false positives:

sort(pvals)[1:10]
[1] 2.358038e-06 1.975210e-05 3.434436e-04 5.318831e-04 5.350339e-04 5.627247e-04 6.006373e-04 6.401473e-04 6.556564e-04 8.084033e-04

FDR helps correcting these raw p-values by accounting for the multiple testing. Now only 1 or maybe 2 genes would be worth a further look:

sort(p.adjust(pvals, method= 'fdr'))[1:10]
 [1] 0.02358038 0.09876048 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.74591910