Dear all,
I am aiming to find 70 % reciprocal overlapping sites and collapse them into a single non-overlapping site list.
However, it seems there is a little more tweaking that needs to be done to collapse into single site list after finding the reciprocal overlap calls.
I have used bedtools intersect
in the below way:
Example sites: cnv.bed
chr1 353 6405
chr1 355 6389
chr1 501 6401
chr1 549 6447
chr1 1812 28093
chr1 3286 6382
chr1 3694 6428
chr1 3695 6413
chr1 3729 6677
chr1 4084 6380
/bedtools2/bin/intersectBed -a cnv.bed -b cnv.bed -f 0.7 -r -wa -wb | head
chr1 353 6405 chr1 353 6405
chr1 353 6405 chr1 355 6389
chr1 353 6405 chr1 501 6401
chr1 353 6405 chr1 549 6447
chr1 355 6389 chr1 353 6405
chr1 355 6389 chr1 355 6389
chr1 355 6389 chr1 501 6401
chr1 355 6389 chr1 549 6447
chr1 501 6401 chr1 353 6405
chr1 501 6401 chr1 355 6389
chr1 501 6401 chr1 501 6401
chr1 501 6401 chr1 549 6447
which lists the sites with 70% reciprocal overlap by comparing the sites with each possible pair.
From here we need to collapse those overlapping sites into a single site list. i.e. to remove the redundant regions keeping only one region which is representative of the overlapping regions. Would you suggest us how to achieve this.
it looks like the code takes
boogens.txt
which is basically one input file. The text after that say sort first for fileA and fileB. I wonder what is the input to the command. Should the input be:Sorry, didn't make that clear.
boogens.txt
would be your output fromintersectBed
, like you had as the second output file in your initial question, which has six columns:columns 1-3 are a peak from file 1 (possibly repeated over multiple lines, if multiple matches in file 2), and columns 4-6 are the matching peaks in file 2.I used the output from intersectBed as input to the awk:
The output and input are similar. Did I use the code in a wrong way or did I tweaked it wrong?
My mistake, I made a syntax error and a logic error.
$3==startB
needs to be$3==endA
. Sorry about that. It seemed to work for me, where I get three lines as output:Is that the output you would expect?
Yes, it seems to have the desired output.