Question

Genomic screen: Benjamini-Hochberg procedure

1

Entering edit mode

5.9 years ago

mr_floyd ▴ 10

Hi,

It is the first time I try to use forum , I am not sure if it is the correct place to ask but.......

I have NGS data of 12,500 unique sgRNA. I wanted to compare the abundance, of every sgRNA, between four control samples to four treatment samples. For doing so I have used R and got the p-value by doing multiple comparison t-test. I have received 12,500 p-values.

The command - p.adjust(p,method="BH") to was used to calculate the corrected p-value.

Now for my question: Seven sgRNA's, out of the pool of 12,500, are predicted to be enriched in my treatment sample.

If I assume these seven sgRNA's would be enriched, based on previews published studies, is it correct to rank the p-value of the seven sgRNA's higher in my BH analysis for corrected p-value?

Here is the code I would use:

ordered.p <- sort(p)
critical.vals <- NULL
bh.fdr.adjusted.p <- NULL
for (i in 1:length(ordered.p)) {
    temp <- (i/length(ordered.p))*0.05
    critical.vals <- c(critical.vals,temp)
    bh.fdr.adjusted.p <- c(bh.fdr.adjusted.p, min((ordered.p*length(ordered.p)/i),1))
}
final.out <- cbind(ordered.p,critical.vals,bh.fdr.adjusted.p)

gene • 1.0k views

ADD COMMENT • link updated 5.9 years ago by mbelmadani ★ 1.4k • written 5.9 years ago by mr_floyd ▴ 10

score 0 · Answer 1 · 2019-08-08

If I assume these seven sgRNA's would be enriched, based on previews published studies, is it correct to rank the p-value of the seven sgRNA's higher in my BH analysis for corrected p-value?

I don't think so. I wouldn't incorporate information from "previous studies" without doing a proper meta-analysis, otherwise that prior information is more or less just an anecdote. Maybe I'm misunderstanding something because that doesn't seem to be what this code is doing. Where is the re-ranking supposed to happen?

Also, are you sure you want to do (ordered.p*length(ordered.p)/i) and not (ordered.p[i]*length(ordered.p)/i) ? The code you have right now would pick the minimum value out of the entire vector * m / i where m is the number of tests, so in all iterations your min() would always pick the same p-value except divided by a different i since that's the only variable part here.

Let me know if I'm misunderstanding something.