Assuming the inputs are disjoint, here is a modification of a general BEDOPS-based solution from Shane Neph for N input files (for you, three inputs, but this works for the general case of N inputs):
$ bedops --everything file1.bed file2.bed ... fileN.bed \
| bedmap --echo-map - \
| awk '(split($0, a, ";") == 3)' - \
| sed 's/\;/\n/g' - \
| sort-bed - \
| uniq - \
> answer.bed
The file answer.bed
contains a unique list of elements from N number of BED files that overlap between three elements by at least one or more bases.
Let's walk through this line by line.
- Take the multiset union of all elements from all N input files with
bedops --everything
. Note that this is not the same as a --merge
operation, which instead makes contiguous regions from all overlapping input regions. Pipe this unioned set to bedmap
.
- For each element from all files, use
bedmap --echo-map
to print any other elements from the multiset union that overlap that first element by one or more bases, using the default overlap criteria. The result will be a semi-colon delimited string that contains one or more overlapping elements. Note that this result will always include the original element, because it overlaps itself by 100%. Pipe this result to the awk
statement.
- The
awk
command splits the string on the semi-colon delimiter and prints out the string if there are exactly three overlapping elements. We do this filter step, because we are interested in any one element that overlaps two other elements by the stated overlap criterion. We pipe this string to sed
.
- The
sed
statement replaces all semi-colons with newline characters. This creates BED-formatted output that we pipe to sort-bed
.
- The
sort-bed
tool takes the overlapping elements and puts them in lexicographically-sorted order. We pipe this to uniq
to remove duplicate elements.
The file answer.bed
is a sorted BED file that contains all unique elements from file1.bed
through fileN.bed
, where there is mutual, minimally one-base overlap between three sets.
If you look at that awk statement, you'll see how to modify that test for your other two use cases, such as when you want elements that overlap no other elements in any other sets ("one"), or the case where you want elements that overlap with only one other set ("two").
You might also find this page useful, which describes this more general approach to this problem and why it is scalable with BEDOPS.
Hi, could you do this?