How to use FDR data?
1
0
Entering edit mode
8.3 years ago
sb • 0

What is FDR in Microarray analysis ? How to use FDR data from class comparison of BRB arraytool?strong text

RNA-Seq BRBArray Tool Microarray analysis • 2.3k views
ADD COMMENT
2
Entering edit mode

FDR means false discovery rate. Its computation is a way of controlling the risk of false positives in a multiple testing situation. See here for more explanations.

ADD REPLY
2
Entering edit mode
8.3 years ago

Not really answering the question, but I find this simple simulation quite instructive in telling why pvalues need to be corrected by FDR.

Let's simulate a dataset of 10k rows (say genes) and 6 columns (arrays) and assign the first three columns (arrays) to condition A and the second three to condition B. Data is pure random noise from a normal distribution:

set.seed(1234)
dat<- matrix(rnorm(n= 60000), nrow= 10000)
colnames(dat)<- c('cond_A', 'cond_A', 'cond_A', 'cond_B', 'cond_B', 'cond_B')

dat[1:10,]
          cond_A      cond_A     cond_A     cond_B       cond_B      cond_B
 [1,] -1.2070657 -1.81689753 -1.6878627  2.4918186 -0.903147902  0.49060054
 [2,]  0.2774292  0.62716684 -0.9552011  0.0532215 -0.006098308  0.02499143
 [3,]  1.0844412  0.51809210 -0.6480572  0.4562491 -0.904131937  1.29905349
 [4,] -2.3456977  0.14092183  0.2610342  1.5770552 -0.060453158 -0.23457321
 [5,]  0.4291247  1.45727195 -1.2196940  0.6223530 -1.094187464 -0.45257621
 [6,]  0.5060559 -0.49359652 -1.5501888  1.1879753  0.352918538 -0.01112573
 [7,] -0.5747400 -2.12224406  0.7750572 -0.2801802  0.030408452  0.93259094
 [8,] -0.5466319 -0.13356660  1.7581137 -1.3515010 -1.403397835 -0.72902894
 [9,] -0.5644520 -0.42760035  1.4179980 -0.2894252  2.525432355  1.39921972
[10,] -0.8900378  0.08779481 -1.2691443 -1.1788329 -1.281886211 -0.05842774
...

Now, let's apply a t.test to each row to see which genes are different between condition A and B:

pvals<- apply(dat, 1, function(x) {t.test(x[1:3], x[4:6])$p.value})

Let's sort the pvalues small to large. Although the data is random, we get some pretty small p-values which are obviously false positives:

sort(pvals)[1:10]
[1] 2.358038e-06 1.975210e-05 3.434436e-04 5.318831e-04 5.350339e-04 5.627247e-04 6.006373e-04 6.401473e-04 6.556564e-04 8.084033e-04

FDR helps correcting these raw p-values by accounting for the multiple testing. Now only 1 or maybe 2 genes would be worth a further look:

sort(p.adjust(pvals, method= 'fdr'))[1:10]
 [1] 0.02358038 0.09876048 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.74591910
ADD COMMENT

Login before adding your answer.

Traffic: 1114 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6