I want to see the overall binding pattern of a TF (e.g. ARID3A) on the complete human genome (hg19). This task comprise of 2 steps:
1- Take human genome (hg19) fatsa and divide it into bins of 500 neucleotides. There will be two files, one containing the coordinates (as below) and other the whole genome fasta sequence
chrom Start End
chr1 1 500
chr1 500 1000
chr1 1000 1500
2- Use one or different tools to identify binding sites of given TF in each of those bin, so tge final results I want is like:
chrom Start End ARID3A
chr1 1 500 binding
chr1 500 1000 no-binding
chr1 1000 1500 binding
If anybody has done something similar then kindly guide me how can I compartmentalize the genome into bin of size 500 and then by using which tools I can predict the binding sites which give me results at each bin level? Thank you.
how are you going to handle motifs that span two of your segments?
A possible option could be to use sliding window of lets say 100 for segmenting the genome. In this case the sequences will be 1:500, 100:600, 200:700 and so on.. I think in that case I can overcome the issue you mentioned.