How extract the longest overlap?
1
0
Entering edit mode
5.7 years ago
star ▴ 350

I have two files that run intersect and find overlap between them using bedtools intersect as bellow: bedtools intersect -a A.bed -b B.bed -wao > intersect.bed . but i would like to extract the longest overlap from my data. Would you please let me know if there is any solution for that.

A.bed

chr   start  end
1     200   250
1     240   300
2     100   120
4     300   360
4     310   400

B.bed

chr   start   end 
1      180   220 
1      210   260  
1      213   348
4      305  352 
4      310   370
4      315  382
4      350  400

Th output of bedtools intersect:

chr start end  chr start end   overlaps.bd
 1  200    250  1  180  220        20
 1  200    250  1  210  260        40 
 1  200    250  1  213  348        37
 1  240    300  1  180  220         0
 1  240    300  1  210  260        20
 1  240    300  1  213  348        60
 4  300    360  4  305  352        47
 4  300    360  4  310  370        50
 4  300    360  4  315  382        45
 4  300    360  4  350  400        10
4  310     400  4  305  352        42
4  310     400  4  310  370        60
4  310     400  4  315  382        67 
4  310     400  4  350  400        50

Expected.bed

chr start  end    chr start end   overlaps.bp 
 1  200    250     1    210  260      40
 1  240   300      1    213  348     60
 4  300    360     4    310  370     50 
 4  310   400      4    315  382     67
bedtools deeptools overlap intersect bedops • 2.3k views
ADD COMMENT
2
Entering edit mode

with datamash, output:

$ datamash  -sH -g 1,2,3 -f max 7 < test.txt | cut -f 1-7 
chr start   end chr start   end overlaps.bd
1   200 250 1   210 260 40
1   240 300 1   213 348 60
4   300 360 4   310 370 50
4   310 400 4   315 382 67

input:

$ cat test.txt 
chr start   end chr start   end overlaps.bd
1   200 250 1   180 220 20
1   200 250 1   210 260 40  
1   200 250 1   213 348 37
1   240 300 1   180 220 0
1   240 300 1   210 260 20
1   240 300 1   213 348 60
4   300 360 4   305 352 47
4   300 360 4   310 370 50
4   300 360 4   315 382 45
4   300 360 4   350 400 10
4   310 400 4   305 352 42
4   310 400 4   310 370 60
4   310 400 4   315 382 67  
4   310 400 4   350 400 50
ADD REPLY
0
Entering edit mode

did you check bedtools sort that allows you to sort by region size (you would have to modify your original intersect command to only return the overlap instead of -wa and -wb)

ADD REPLY
0
Entering edit mode

yes I have checked it but it was not useful in my case. I want to extract the largest overlap while the same coordinate in file A have several overlaps with coordinates of B file.

ADD REPLY
4
Entering edit mode
5.7 years ago

test.bed

1   200 250 1   180 220 20
1   200 250 1   210 260 40
1   200 250 1   213 348 37
1   240 300 1   180 220 0
1   240 300 1   210 260 20
1   240 300 1   213 348 60
4   300 360 4   305 352 47
4   300 360 4   310 370 50
4   300 360 4   315 382 45
4   300 360 4   350 400 10
4   310 400 4   305 352 42
4   310 400 4   310 370 60
4   310 400 4   315 382 67
4   310 400 4   350 400 50

cat test.bed | sort -k1,1 -k2,2n -k7,7nr | groupBy -g 1,2,3,4 -c 7 -o first -full | cut -f1-7

1   200 250 1   210 260 40
1   240 300 1   213 348 60
4   300 360 4   310 370 50
4   310 400 4   315 382 67

Ps: groupBy is bedtools groupBy

ADD COMMENT
1
Entering edit mode

didn't know about groupby, that's an elegant solution (that doesn't even need the sort if the information from b are not needed)

 bedtools intersect -a test1.bed -b test2.bed  -wao | bedtools groupby -grp 1-3 -c 7 -o max
ADD REPLY
0
Entering edit mode

Thanks for your solution. what is the -o first -full and what is it difference with -o max -full?

ADD REPLY
1
Entering edit mode

-o max -full provides the first full line irrespective if its max or not. Sorting makes sure the 'first' is the actual full overlap.

ADD REPLY
0
Entering edit mode

Did you try geek_y suggestion and did it solve your problem? Please provide feedback by accepting and / or up-voting helpful / correct answers.

ADD REPLY

Login before adding your answer.

Traffic: 2692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6