Entering edit mode
19 months ago
Dan
▴
180
I read the textbook https://web.stanford.edu/class/bios221/book/06-chap.html about the false discovery proportion and the p-value histogram.
library("DESeq2")
library("airway")
data("airway")
aw = DESeqDataSet(se = airway, design = ~ cell + dex)
aw = DESeq(aw)
awde = as.data.frame(results(aw)) |> dplyr::filter(!is.na(pvalue))
alpha = binw = 0.025
pi0 = 2 * mean(awde$pvalue > 0.5)
ggplot(awde,
aes(x = pvalue)) + geom_histogram(binwidth = binw, boundary = 0) +
geom_hline(yintercept = pi0 * binw * nrow(awde), col = "blue") +
geom_vline(xintercept = alpha, col = "red")
I do not understand why the false discovery proportion is calculated with
pi0 = 2 * mean(awde$pvalue > 0.5)
pi0 * binw * nrow(awde)
Thanks