How to find overlapping regions among three bed files [Solved]
2
1
Entering edit mode
9.0 years ago
Bioradical ▴ 60

So I have a pretty simple question that I actually can't seem to be able to figure out. So far I have been using bedtools to find overlaps between two bed files using intersectBed. Example:

Bed A
Bed B
Bed C

Now I have three generated bed files that I want to overlap to find peaks/regions common among all three.

ABC

The multiintersect option doesn't seem to have any documentation besides the information found here Bedtools Compare Multiple Bed Files? but the function itself doesn't seem to give me the information I'm looking for. Specifically I want to feed in three bed files and find only the common regions between ALL three, not AB, AC, BC, and ABC in one large file which seems to be the output shown in the linked example.

I believe that intersectBed -a A -b B C does something similar to the above, but perhaps I'm simply running it wrong and my attempts are errors on my part.

Can this be done using bedtools? If not, what other similar software is out there that can accomplish this?

I appreciate any help,

Carlos

bedtools overlaps • 6.6k views
ADD COMMENT
2
Entering edit mode

first find in-between any two files and use the results to compare with third file.

ADD REPLY
0
Entering edit mode

Excellent. This achieved exactly what I wanted. Thank you!

ADD REPLY
3
Entering edit mode
9.0 years ago

Answer:

First find in-between any two files and use the results to compare with third file.

intersectBed -a 1.bed -b 2.bed | intersectBed -a - -b 3.bed
ADD COMMENT
2
Entering edit mode
9.0 years ago

Here's a more general approach with BEDOPS bedmap --count, which generalizes to N input files:

$ N=`ls *.bed | wc -l`
$ bedops --everything A.bed B.bed C.bed ... N.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -vN=${N} '$1==N' \
    | cut -f2- \
    > common.bed

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to N-1 inputs, N-2 inputs, etc.

ADD COMMENT
0
Entering edit mode

Dear Alex,

Does --count --echo identifies the overlaps within the merged list? Can it do that? Also could you please specify what does "\" and "-" does? Also, I tried the N= ls *bed | wc -l but it gives me N command not found error in command line.

Thank you for your help.

Tunc

ADD REPLY
0
Entering edit mode

My advice is to break things down so you see how it works.

After:

$ N=`ls *.bed | wc -l`

Then run:

$ echo "${N}"

Likewise, run tee in between the two steps here, and after the bedmap statement:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | ... > common.bed

So:

$ bedops --everything A.bed B.bed C.bed ... N.bed | tee betweenSteps1and2.txt | bedmap --count --echo --delim '\t' - | ... > common.bed

And:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | tee betweenSteps2and3.txt |... > common.bed

The \ character lets you break a pipeline down on multiple lines, and the - character specifies standard input, in place of a regular file. Using standard input and output streams is an important advantage to using BEDOPS and Unix tools, so it is worth a few minutes to read about.

ADD REPLY

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6