If your BED files are sorted, you could use bedops to union all the TFBS to standard output, and pipe that result to bedmap to do the mapping of TFs to master regions.
(This technique assumes that the TFBS files are minimally BED4. That is, the fourth column in each TFBS file contains the ID of the TF. If that is not the case, describe your format in more detail and I'll suggest a quick one-liner with awk to fix up files into the correct form.)
Here's the one-liner that unions and maps:
$ bedops --everything tfbs*.bed | bedmap --echo --echo-map-id-uniq --delim '\t' master.bed - > answer.bed
Piping to standard output avoids the unnecessary step of making an intermediate file somewhere on the hard drive, which is otherwise very expensive in time. So this should be very fast.
Assuming that the ID fields in each of tfbs001.bed
through tfbs400.bed
contain the desired TF names or other identifiers of choice, the file answer.bed
contains the results as you expect, except that it uses a semi-colon as an ID delimiter, instead of a comma. You could add --multidelim ','
to the bedmap statement, if that is a requirement.
If your BED files are not sorted, you could first prepare them with BEDOPS sort-bed, which is faster at sorting BED files than GNU sort.
$ for tfbs_fn in `ls tfbs*.bed`; do sort-bed $tfbs_fn > sorted.$tfbs_fn; done
$ sort-bed master.bed > sorted.master.bed
Then use the sorted files in downstream BED ops. You only need to sort once.
Great method, I shall have to look more into what bedops offers. Just wondering why '+t' appears at the start of the last column? E.g. '+tCREB1;CST6;'. I have used 'sed' to remove it for the moment.
Do your BED files come from Excel or Windows? Such files usually need to be cleaned up. You might take a look at the suggestions in my answer in this thread: bedmap output on one line
I ran the one-liner, apparently with success, however there are no clusters on chromosomes 10-22 (human), even though there are TFBS on those chromosomes in the bedops input. Is there any obvious reason when I am seeing this? -- IGNORE I failed to also sort the master file --
Yeah, just run sort-bed on BED files and you'll be fine.