Question

Intersect multiple bed and keep all fields

1

Entering edit mode

10.0 years ago

user230613 ▴ 380

Hi there biostars,
I'm trying to intersect three bed files in order to get common features between the three of them, two of them, and also features specific per each bed. I've tried with multiIntersectBed (bedtools) and bedops also, but, the problem is that I want to keep all the fields of my input files for the overlapping. I mean, I'm looking for something like -wao in bedtools intersect but with multiple bed files.

Any suggestion will be appreciated.

bed bedtools bedops • 12k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by user230613 ▴ 380

0

Entering edit mode

Hi, could you do this?

ADD REPLY • link 3.7 years ago by Farzaneh • 0

Ram · Answer 1 · 2015-08-03

Assuming the inputs are disjoint, here is a modification of a general BEDOPS-based solution from Shane Neph for N input files (for you, three inputs, but this works for the general case of N inputs):

$ bedops --everything file1.bed file2.bed ... fileN.bed \
    | bedmap --echo-map - \
    | awk '(split($0, a, ";") == 3)' - \
    | sed 's/\;/\n/g' - \
    | sort-bed - \
    | uniq - \
    > answer.bed

The file answer.bed contains a unique list of elements from N number of BED files that overlap between three elements by at least one or more bases.

Let's walk through this line by line.

Take the multiset union of all elements from all N input files with bedops --everything. Note that this is not the same as a --merge operation, which instead makes contiguous regions from all overlapping input regions. Pipe this unioned set to bedmap.
For each element from all files, use bedmap --echo-map to print any other elements from the multiset union that overlap that first element by one or more bases, using the default overlap criteria. The result will be a semi-colon delimited string that contains one or more overlapping elements. Note that this result will always include the original element, because it overlaps itself by 100%. Pipe this result to the awk statement.
The awk command splits the string on the semi-colon delimiter and prints out the string if there are exactly three overlapping elements. We do this filter step, because we are interested in any one element that overlaps two other elements by the stated overlap criterion. We pipe this string to sed.
The sed statement replaces all semi-colons with newline characters. This creates BED-formatted output that we pipe to sort-bed.
The sort-bed tool takes the overlapping elements and puts them in lexicographically-sorted order. We pipe this to uniq to remove duplicate elements.

The file answer.bed is a sorted BED file that contains all unique elements from file1.bed through fileN.bed, where there is mutual, minimally one-base overlap between three sets.

If you look at that awk statement, you'll see how to modify that test for your other two use cases, such as when you want elements that overlap no other elements in any other sets ("one"), or the case where you want elements that overlap with only one other set ("two").

You might also find this page useful, which describes this more general approach to this problem and why it is scalable with BEDOPS.

Ram · Answer 2 · 2015-08-03

1

Entering edit mode

10.0 years ago

GouthamAtla 12k

I think by default, multiIntersectBed handles that. For e.g:

# a.bed
chr1  6   12
chr1  10  20
chr1  22  27
chr1  24  30

# b.bed
chr1  12  32
chr1  14  30

# c.bed
chr1  8   15
chr1  10  14
chr1  32  34

$ multiIntersectBed -i a.bed b.bed c.bed 
chr1    6    8    1    1    1    0    0
chr1    8    12    2    1,3    1    0    1
chr1    12    15    3    1,2,3    1    1    1
chr1    15    20    2    1,2    1    1    0
chr1    20    22    1    2    0    1    0
chr1    22    30    2    1,2    1    1    0
chr1    30    32    1    2    0    1    0
chr1    32    34    1    3    0    0    1

Here the 4th column says in how many files the region is present and 5th column says what are those files. Here the region chr1:12-15 present in three files, chr1:15-20 present in two files and chr1:30-32 andchr:32-34 present in any one of the files. This satisfies your statement "common features between the three of them, two of them, and also features specific per each bed."

Are you looking for something more?

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by GouthamAtla 12k

0

Entering edit mode

Yes, I've already tried this one. But, the thing is that I've more fields (columns) with another information in the input bed files that I'd like to keep after intersecting. Sorry if I didn't explain myself in a clear way. And thanks!

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by user230613 ▴ 380

0

Entering edit mode

Can you just show small example (3-5 lines) of how the data looks and what is the expected output?

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by GouthamAtla 12k

Ram · Answer 3 · 2015-08-03

Would piping output from intersect to another intersect give you what you want?

bedtools intersect -wao -a a.bed -b b.bed | bedtools intersect -wao -a - -b c.bed

chr1    10      20      a1      1       +       .       -1      -1      .       -1      .       0       chr1    0  100      c1      1       +       10
chr1    100     200     a2      2       -       chr1    90      101     b2      2       -       1       .       -1 -1       .       -1      .       0
chr1    100     200     a2      2       -       chr1    100     110     b3      3       +       10      .       -1 -1       .       -1      .       0

score 0 · Answer 4 · 2017-10-25

0

Entering edit mode

7.8 years ago

Ashley M. Conard • 0

This does not seem to work for me. It collapses some intervals together.

ADD COMMENT • link 7.8 years ago by Ashley M. Conard • 0

0

Entering edit mode

The above suggestions are fine in case you have edited your files in windows or in excel before using bedtools just run dos2unix and use bedtools to intersect then it will not collapse

ADD REPLY • link 7.2 years ago by prasoonagar ▴ 10