Assessing significance of protein binding data inside defined genomic intervals
1
1
Entering edit mode
10.3 years ago
Sakti ▴ 530

Dear friends,

Once again I am back to consult your wisdom. Very recently I obtained a list of regions inside mouse chromosome 7 which are contacting a specific nuclear body (sorry, cannot give more details about it). Several proteins overlap these regions (i.e. cohesin). However, I would like to know how significant these overlap ratios are compared to a randomly chosen region set (which has the same length characteristics as my original nuclear body dataset).

Does anyone know any tool one could use to perform this analysis? I found the R package named coocur but this analyzes protein binding sites co-occurrence, which I think is a little different from what I'm trying to do.

Also, in case such program does not exist, what would be the best way to proceed in terms of statistical tests? I was thinking on writing a script that chooses regions randomly with the same length as my nuclear dataset, calculating overlaps, and then comparing such ratios with my nuclear body ratios. But then I think maybe boostrapping is also necessary, but I'm not sure what statistical test should I use in that case.

I'd appreciate any insight you may provide.

Thanks!!

Sakti

genome protein-binding chip-seq statistics • 2.1k views
ADD COMMENT
0
Entering edit mode

nuclear body = sparse term ? can you be a bit more specific?

transcription factor, enhancer etc.?

ADD REPLY
0
Entering edit mode

I think the answers for this question is what you are looking for: Сalculating fold-enrichment of ChIP-seq peaks intersecting with promoters (vs. genome average)

ADD REPLY
1
Entering edit mode
10.2 years ago
Michael 55k

An experiment of drawing random genomic positions with two outcomes - overlap with a gene (success) or no overlap (fail) - is a Bernoulli trial with success probability C/G (C= #of bases in genes, vs. total # of bases in the Genome). Therefore the Binomial distribution is suitable to calculate the cumulative distribution function for a certain number of N or more successes in M trials. This doesn't depend on how your genomic location is selected.

ADD COMMENT

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6