I have two bed files, one containing non-overlapping windows of 10,000bp of the genome, and the other containing the capture regions for whole-exon sequencing. I want to know (for each 10,000bp window) which capture regions it overlaps. And in the case where a capture region overlaps two 10,000bp windows, just report the one with highest overlap.
I've been playing around with bedtools to do this via commands like,
bedtools closest -a a.bed -b b.bed
But this reports the capture region multiple times in the event it overlaps two 10,000bp windows. I've also tried using the -first and -last option, but they do not report the capture region with highest overlap. Here are the two example bed files and output:
a.bed
1 1 10
1 11 20
1 21 30
1 31 40
b.bed
1 4 9
1 11 14
1 15 24
1 21 31
Output (from the command above):
1 1 10 1 4 9
1 11 20 1 11 14
1 11 20 1 15 24
1 21 30 1 15 24
1 21 30 1 29 39
1 31 40 1 29 39
I think you should check
intersectBed
option instead ofclosestBed
. Also checkgroupBy
from BedTools to find the regions with highest overlap.Thanks for the tip. I tried the following command and it seems to work:
The output:
EDIT: oops, actually this does not seem to work, let me tinker a bit more