I do have a set of TF binding coordinates and want to see if there is any significant overlap with an open chromatin annotation.
Example of TF coord:
chr1 19280 19298
chr1 245920 245938
chr2 97290 97308
chr9 752910 752938
...
Example of open chrom. coord. (UCSC track):
chr2 33031543 33032779
chr3 2304169 2304825
chr5 330899 330940
...
I have checked the intersection with the Bedtools (open chrom. coord vs TF coord. -/+ 100bp) and now I want to check the intersection between random genomic coordinates and open chrom.
The idea is to:
- Pick random genomic position (from the same chromosome as TF coordinate);
- -/+9bp (binding site size);
- -/+ 100bp;
- Run this simulation for 1000 times (TF x 1000);
- Bedtools;
Any ideas how can I do this simulation to pick random genomic positions from the same chromosome? I know a little bit of bash and Perl, but won't be able to write the script by myself.
Is it possible to measure the length of every chromosome;
Pick TF chromosome and from it's length get a random number which would represent a genomic position?
Can someone help me with the simulation and the pipeline.
Some time we need keep similar GC contents or some other metric, any idea to update the script?
Very interesting paper - thanks for the link!
I love the new bedtools works with commands like samtools - I did not realize it up to know, the manuals still list the old scripts - I have been meaning to suggest it but did not want to look like I don't appreciate the existing tools already. Great link to the paper as well!
@Istvan - yeah sorry, the docs are out of date. in the process of hiring someone to take over all of that and we should have a new docs website up by Fall.
Where can I please download hg19.chrom.sizes?
If you have mysql installed, you can do:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.chrom.sizes
Otherwise, you can use the UCSC Table Browser. http://genome.ucsc.edu/cgi-bin/hgTables?command=start Group = "All tables" Table=chromInfo