You could use BEDOPS bedops
to map genes to peaks:
$ bedops --element-of 1 genes.bed peaks.bed > genes_overlapping_peaks.bed
Once you have those genes, use bedops --range
to generate a proximal promoter regions for those genes, say 1kb upstream of the TSS.
$ awk '$6=="+"' | awk '{ print $1"\t"$2"\t"($2+1); }' | bedops --everything --range -1000:0 - > promoters.for.bed
$ awk '$6=="-"' | awk '{ print $1"\t"$3"\t"($3+1); }' | bedops --everything --range 0:1000 - > promoters.rev.bed
$ bedops --everything promoters.for.bed promoters.rev.bed > promoters.bed
Separately, locate putative TF binding sites and their positions with a tool like FIMO and some TF database (JASPAR, TRANSFAC, UniPROBE, Taipale, etc.) at some desired statistical threshold. You could do a set operation on these results with your TF-specific ChIP regions.
One you have promoter regions and TF binding sites, do a BEDOPS bedmap
operation on these two sets:
$ bedmap --echo --echo-map-id-uniq promoters.bed TFs.bed > answer.bed
The file answer.bed
will contain a list of promoters and the IDs or names of the transcription factors that bind to — "target" — those promoter regions.
Thanks for the suggestion. Marty