Dear Biostars,
Once again I am here consulting your knowledge. I have studied binding sites for various proteins and have mapped their positions along chr3 in mouse. After plotting the data I recognized that several of these binding sites are clustered along stretches of the chr3 sequence.
I am trying to assess the significance of overlaps of my protein binding sites with other protein binding sites, in a manner that takes into account the underlying biased location distributions of my data.
I used the bedtools shuffle and intersect programs like this:
bedtools shuffle -chrom -i mybindingsitespos.bed -g mm9.chr.sizes | \ bedtools intersect -a otherproteinbindsitespos.bed -b - | wc -l
However shuffle randomly chooses locations along the chr3 sequence, and does not take into account the background distribution of my data. How can I feed these programs that distribution, or how could I implement the same analysis using R?
I've been searching all over the net and wasn't able to find much. I'd appreciate any comment on your part!