Question

Assessing significance of protein binding data inside defined genomic intervals

1

Entering edit mode

10.7 years ago

Sakti ▴ 530

Dear friends,

Once again I am back to consult your wisdom. Very recently I obtained a list of regions inside mouse chromosome 7 which are contacting a specific nuclear body (sorry, cannot give more details about it). Several proteins overlap these regions (i.e. cohesin). However, I would like to know how significant these overlap ratios are compared to a randomly chosen region set (which has the same length characteristics as my original nuclear body dataset).

Does anyone know any tool one could use to perform this analysis? I found the R package named coocur but this analyzes protein binding sites co-occurrence, which I think is a little different from what I'm trying to do.

Also, in case such program does not exist, what would be the best way to proceed in terms of statistical tests? I was thinking on writing a script that chooses regions randomly with the same length as my nuclear dataset, calculating overlaps, and then comparing such ratios with my nuclear body ratios. But then I think maybe boostrapping is also necessary, but I'm not sure what statistical test should I use in that case.

I'd appreciate any insight you may provide.

Thanks!!

Sakti

genome protein-binding chip-seq statistics • 2.3k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Sakti ▴ 530

0

Entering edit mode

nuclear body = sparse term ? can you be a bit more specific?

transcription factor, enhancer etc.?

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Khader Shameer 18k

0

Entering edit mode

I think the answers for this question is what you are looking for: Сalculating fold-enrichment of ChIP-seq peaks intersecting with promoters (vs. genome average)

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.2 years ago by Fidel ★ 2.0k

Ram · Answer 1 · 2014-09-21

An experiment of drawing random genomic positions with two outcomes - overlap with a gene (success) or no overlap (fail) - is a Bernoulli trial with success probability C/G (C= #of bases in genes, vs. total # of bases in the Genome). Therefore the Binomial distribution is suitable to calculate the cumulative distribution function for a certain number of N or more successes in M trials. This doesn't depend on how your genomic location is selected.