Question

Bedtools Compare Multiple Bed Files to one Bed files?

1

Entering edit mode

10.0 years ago

BIOTIN ▴ 50

I've been dealing with comparison between 40 bed files to one bed file using intersectBed -a -b command. I'm just wondering, is there any commands in Bedtools which can help us compare multiple bed files?

Say, I have 40 bed files and one particular bed file. I want to identify those regions in the 40 bed files overlaps with the particular one. I mean the 40 to 1 comparison.

Is there any fast ways to compare them and do not need type the code one by one like intersectBed -a -b, intersectBed -a -c, intersectBed -a -d, intersectBed -a -e...

sequencing alignment genome • 6.8k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by BIOTIN ▴ 50

Ram · Accepted Answer · 2014-11-15

One fast option is to use BEDOPS:

$ bedops --intersect A.bed B.bed C.bed ... > answer.bed

You can use lots of inputs efficiently. The input files just have to be sorted.

The above command intersects B.bed C.bed etc. with A.bed, reporting all elements of A.bed that overlap B.bed C.bed etc.

Let's say you want to go the other direction efficiently. You can use BEDOPS with UNIX pipes and redirect standard output from one command to the next:

$ bedops --everything B.bed C.bed ... | bedops --intersect - A.bed > answer.bed

This does a multiset union of all the elements in B.bed C.bed etc. and passes these to an --intersect operation with A.bed.

The result file reports all elements of B.bed C.bed etc., which overlap A.bed.

The difference between these two directions is in which sets of elements get reported in the overlap. In the first case, elements of A.bed are reported. In the second case, elements of B.bed C.bed etc. are reported. Generally, this is not a symmetric operation.

If you have a lot of files to sort, a quick bash one-liner can take care of this:

$ for fn in `ls *.bed`; do sort-bed ${fn} > ${fn%.*}.sorted.bed; done

Some use GNU sort to do sorting of BED files, but BEDOPS sort-bed is usually faster.

Ram · Accepted Answer · 2014-11-15

0

Entering edit mode

10.0 years ago

Jorge Amigo 14k

latest bedtools version allows using wildcards, so finding the overlapping regions of all that 40 bed files with that particular 1 bed file would be as simple as:

bedtools intersect -a particular.bed -b all40*bed > all.40vs1.compared.bed

if you want to get the comparison for each one of the 40 with that particular 1, then you'll definitely have to loop:

for file in all40*bed; do
  bedtools intersect -a particular.bed -b $file > compared.$file
done

ADD COMMENT • link 10.0 years ago by Jorge Amigo 14k

0

Entering edit mode

Using parallel:

parallel bedtools intersect -a particular.bed -b {} > compared.{} ::: all40*bed

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 10.0 years ago by GouthamAtla 12k