One fast option is to use BEDOPS:
$ bedops --intersect A.bed B.bed C.bed ... > answer.bed
You can use lots of inputs efficiently. The input files just have to be sorted.
The above command intersects B.bed C.bed
etc. with A.bed
, reporting all elements of A.bed
that overlap B.bed C.bed
etc.
Let's say you want to go the other direction efficiently. You can use BEDOPS with UNIX pipes and redirect standard output from one command to the next:
$ bedops --everything B.bed C.bed ... | bedops --intersect - A.bed > answer.bed
This does a multiset union of all the elements in B.bed C.bed
etc. and passes these to an --intersect
operation with A.bed
.
The result file reports all elements of B.bed C.bed
etc., which overlap A.bed
.
The difference between these two directions is in which sets of elements get reported in the overlap. In the first case, elements of A.bed
are reported. In the second case, elements of B.bed C.bed
etc. are reported. Generally, this is not a symmetric operation.
If you have a lot of files to sort, a quick bash one-liner can take care of this:
$ for fn in `ls *.bed`; do sort-bed ${fn} > ${fn%.*}.sorted.bed; done
Some use GNU sort
to do sorting of BED files, but BEDOPS sort-bed
is usually faster.
Using parallel: