What is FDR in Microarray analysis ? How to use FDR data from class comparison of BRB arraytool?strong text
What is FDR in Microarray analysis ? How to use FDR data from class comparison of BRB arraytool?strong text
Not really answering the question, but I find this simple simulation quite instructive in telling why pvalues need to be corrected by FDR.
Let's simulate a dataset of 10k rows (say genes) and 6 columns (arrays) and assign the first three columns (arrays) to condition A and the second three to condition B. Data is pure random noise from a normal distribution:
set.seed(1234)
dat<- matrix(rnorm(n= 60000), nrow= 10000)
colnames(dat)<- c('cond_A', 'cond_A', 'cond_A', 'cond_B', 'cond_B', 'cond_B')
dat[1:10,]
cond_A cond_A cond_A cond_B cond_B cond_B
[1,] -1.2070657 -1.81689753 -1.6878627 2.4918186 -0.903147902 0.49060054
[2,] 0.2774292 0.62716684 -0.9552011 0.0532215 -0.006098308 0.02499143
[3,] 1.0844412 0.51809210 -0.6480572 0.4562491 -0.904131937 1.29905349
[4,] -2.3456977 0.14092183 0.2610342 1.5770552 -0.060453158 -0.23457321
[5,] 0.4291247 1.45727195 -1.2196940 0.6223530 -1.094187464 -0.45257621
[6,] 0.5060559 -0.49359652 -1.5501888 1.1879753 0.352918538 -0.01112573
[7,] -0.5747400 -2.12224406 0.7750572 -0.2801802 0.030408452 0.93259094
[8,] -0.5466319 -0.13356660 1.7581137 -1.3515010 -1.403397835 -0.72902894
[9,] -0.5644520 -0.42760035 1.4179980 -0.2894252 2.525432355 1.39921972
[10,] -0.8900378 0.08779481 -1.2691443 -1.1788329 -1.281886211 -0.05842774
...
Now, let's apply a t.test to each row to see which genes are different between condition A and B:
pvals<- apply(dat, 1, function(x) {t.test(x[1:3], x[4:6])$p.value})
Let's sort the pvalues small to large. Although the data is random, we get some pretty small p-values which are obviously false positives:
sort(pvals)[1:10]
[1] 2.358038e-06 1.975210e-05 3.434436e-04 5.318831e-04 5.350339e-04 5.627247e-04 6.006373e-04 6.401473e-04 6.556564e-04 8.084033e-04
FDR helps correcting these raw p-values by accounting for the multiple testing. Now only 1 or maybe 2 genes would be worth a further look:
sort(p.adjust(pvals, method= 'fdr'))[1:10]
[1] 0.02358038 0.09876048 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.72850709 0.74591910
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
FDR means false discovery rate. Its computation is a way of controlling the risk of false positives in a multiple testing situation. See here for more explanations.