I'm using PWM (position weight matrix) scores to determine whether a TF binds to a DNA sequence across the genome. However I also need to do correction for multiple hypothesis. I was thinking of FDR benjamini hochberg correction but doesn't that assume independence of test statistics (in this the PWM score)? However aren't scores on overlapping sequences gonna be correlated?
What is a PWM score? Pulse-width-modulation doesn't seem to make sense. What is a TD binding?
When all else fails, Bonferroni correction is overconservative.
Oops, was typing on phone. TD is TF (transcription factor). PWM (position weight matrix) score is the score of binding computed by multiplying the base probability in the PWM matrix for a TF for the position of that base across all positions. Bonferri is definitely too conservative but I wanted to make sure I wasn't violating any assumptions of Benjamini-Hochberg either.
So you're going to have a lot of scores that are indeed somewhat non-independent. Benjamini-Hochberg can work, but you don't really have any statistical hypothesis tests going on.
Why not just collect a big list of scores and use the top few? It depends on how you intend on using the result. Top 5% locations of a 4gigabase reference is still 200 million bases.
I think what you need is a statistical control. You want to say the TF binds "well" to the location, but need to define "binding well" as compared to something. That would be the null distribution of your measure.
I would possibly take random fake TF sequences and collect their score lists to compare. Scores above the top 1% of random sequences could be considered real. Because you don't know at the outset if the TF could truly bind to dozens or millions of genomic locations.
What is a PWM score? Pulse-width-modulation doesn't seem to make sense. What is a TD binding? When all else fails, Bonferroni correction is overconservative.
Oops, was typing on phone. TD is TF (transcription factor). PWM (position weight matrix) score is the score of binding computed by multiplying the base probability in the PWM matrix for a TF for the position of that base across all positions. Bonferri is definitely too conservative but I wanted to make sure I wasn't violating any assumptions of Benjamini-Hochberg either.
Thanks for the clarification.