I have two files that run intersect and find overlap between them using bedtools intersect as bellow: bedtools intersect -a A.bed -b B.bed -wao > intersect.bed
. but i would like to extract the longest overlap from my data. Would you please let me know if there is any solution for that.
A.bed
chr start end
1 200 250
1 240 300
2 100 120
4 300 360
4 310 400
B.bed
chr start end
1 180 220
1 210 260
1 213 348
4 305 352
4 310 370
4 315 382
4 350 400
Th output of bedtools intersect:
chr start end chr start end overlaps.bd
1 200 250 1 180 220 20
1 200 250 1 210 260 40
1 200 250 1 213 348 37
1 240 300 1 180 220 0
1 240 300 1 210 260 20
1 240 300 1 213 348 60
4 300 360 4 305 352 47
4 300 360 4 310 370 50
4 300 360 4 315 382 45
4 300 360 4 350 400 10
4 310 400 4 305 352 42
4 310 400 4 310 370 60
4 310 400 4 315 382 67
4 310 400 4 350 400 50
Expected.bed
chr start end chr start end overlaps.bp
1 200 250 1 210 260 40
1 240 300 1 213 348 60
4 300 360 4 310 370 50
4 310 400 4 315 382 67
with datamash, output:
input:
did you check bedtools sort that allows you to sort by region size (you would have to modify your original intersect command to only return the overlap instead of
-wa
and-wb
)yes I have checked it but it was not useful in my case. I want to extract the largest overlap while the same coordinate in file A have several overlaps with coordinates of B file.