Question

How to find overlapping regions among three bed files [Solved]

1

Entering edit mode

9.5 years ago

Bioradical ▴ 60

So I have a pretty simple question that I actually can't seem to be able to figure out. So far I have been using bedtools to find overlaps between two bed files using intersectBed. Example:

Bed A
Bed B
Bed C

Now I have three generated bed files that I want to overlap to find peaks/regions common among all three.

ABC

The multiintersect option doesn't seem to have any documentation besides the information found here Bedtools Compare Multiple Bed Files? but the function itself doesn't seem to give me the information I'm looking for. Specifically I want to feed in three bed files and find only the common regions between ALL three, not AB, AC, BC, and ABC in one large file which seems to be the output shown in the linked example.

I believe that intersectBed -a A -b B C does something similar to the above, but perhaps I'm simply running it wrong and my attempts are errors on my part.

Can this be done using bedtools? If not, what other similar software is out there that can accomplish this?

I appreciate any help,

Carlos

bedtools overlaps • 7.0k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.5 years ago by Bioradical ▴ 60

2

Entering edit mode

first find in-between any two files and use the results to compare with third file.

ADD REPLY • link 9.5 years ago by GouthamAtla 12k

0

Entering edit mode

Excellent. This achieved exactly what I wanted. Thank you!

ADD REPLY • link 9.5 years ago by Bioradical ▴ 60

2

Entering edit mode

9.5 years ago

Alex Reynolds 36k

Here's a more general approach with BEDOPS bedmap --count, which generalizes to N input files:

$ N=`ls *.bed | wc -l`
$ bedops --everything A.bed B.bed C.bed ... N.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -vN=${N} '$1==N' \
    | cut -f2- \
    > common.bed

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to N-1 inputs, N-2 inputs, etc.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by Alex Reynolds 36k

0

Entering edit mode

Dear Alex,

Does --count --echo identifies the overlaps within the merged list? Can it do that? Also could you please specify what does "\" and "-" does? Also, I tried the N= ls *bed | wc -l but it gives me N command not found error in command line.

Thank you for your help.

Tunc

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by morovatunc ▴ 560

0

Entering edit mode

My advice is to break things down so you see how it works.

After:

$ N=`ls *.bed | wc -l`

Then run:

$ echo "${N}"

Likewise, run tee in between the two steps here, and after the bedmap statement:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | ... > common.bed

So:

$ bedops --everything A.bed B.bed C.bed ... N.bed | tee betweenSteps1and2.txt | bedmap --count --echo --delim '\t' - | ... > common.bed

And:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | tee betweenSteps2and3.txt |... > common.bed

The \ character lets you break a pipeline down on multiple lines, and the - character specifies standard input, in place of a regular file. Using standard input and output streams is an important advantage to using BEDOPS and Unix tools, so it is worth a few minutes to read about.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by Alex Reynolds 36k

Ram · Accepted Answer · 2015-11-18

3

Entering edit mode

9.5 years ago

GouthamAtla 12k

Answer:

First find in-between any two files and use the results to compare with third file.

intersectBed -a 1.bed -b 2.bed | intersectBed -a - -b 3.bed

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by GouthamAtla 12k