Question

Bias in hypergeometric test

1

Entering edit mode

7.2 years ago

kakukeshi ▴ 80

Dear colleagues,

I calculated the significance of the overlap between drug-targeted genes and some disease genes using a hypergeometric test. The problem is that I noticed that the drugs with a lower p-value are the ones that are targeting more genes in general. What does this mean? Is my result is biased by the current knowledge about drug-target interactions? there's any way to normalize for this?.

P.S I'm using STITCH for the drug-target information

Thanks

drug gene hypergeometric test association • 1.7k views

ADD COMMENT • link updated 7.2 years ago by ivivek_ngs ★ 5.2k • written 7.2 years ago by kakukeshi ▴ 80

score 2 · Answer 1 · 2017-09-12

The drug-target gene interactions cannot be done based on the hypergeometric and this cannot be biologically correlated since such interactions depend on many factors which will be variable attributes that will be evidence for the interaction as also stated by @Devon Ryan. There are tools which do this specifically like Open-targets or DGIdb. Take a look at them for your results and the have combinatorial p-value and association score statistics. They also employ evidence from various NGS datasets and pretty well also use ExAC database and GWAS catalogue for associativity to get druggable hit in a genome. This should be more insightful for your data.

score 1 · Answer 2 · 2017-09-12

You've chosen a test that works by checking the number of genes changed in a group. That a drug that changes more things is associated with more group enrichments is completely expected. Whether this is biologically correct is a different question and not one that can be answered with simple statistics (rather, you'd need to have a good understanding about how changes in protein level will affect activity elsewhere in the pathway that isn't evidenced by protein-level changes).