How to find Overlapping coordinates on their 3' or 5'?
1
0
Entering edit mode
7.7 years ago
A. Domingues ★ 2.7k

Given many genomic coordinates, how do I find those that have matching 5' or 3' ends, regardless of length. for example:

chr1    10  20  feat1   .   +

chr1    10  25  feat2   .   +

chr1    12  20  feat3   .   +

All the features overlap, but I am only interested in feat1 and feat2 which share the same start position, or in this case the 5'. Should I be interested in the 3' end, than, feat1 and feat3 (plus the coordinates).

I have looked at the options of bedtools but couldn't find out how to do this. Is there a tool out there that allows me to do this, if so which, or go I have to cobble something together? I guess awk is always an option.

bed bedtools intersection genomic coordinates • 1.3k views
ADD COMMENT
1
Entering edit mode
7.7 years ago

I am only interested in feat1 and feat2

create a pseudo-column [chrom/start] , join the file with itlself, remove the self/self

join -t $'\t' -1 1 -2 1 <(sed 's/\t/_/' input.txt | sort -t $'\t' -k1,1 )  <(sed 's/\t/_/' input.txt | sort -t $'\t' -k1,1 ) | awk -F '\t' '($2!=$6)' | tr "_" "\t"
ADD COMMENT
0
Entering edit mode

Nice idea, but the the join does not seem to be working:


join -t $'\t' -1 1 -2 1 <(sed 's/\t/_/' a.bed | sort -t $'\t' -k1,1 )  <(sed 's/\t/_/' a.bed | sort -t $'\t' -k1,1 ) 

chr1 10 20 feat1 . +

chr1 10 25 feat2 . +

chr1 12 20 feat3 . +

The full command returns nothing.

ADD REPLY
0
Entering edit mode

Works on my machine. what's your delimiter ?

ADD REPLY
0
Entering edit mode

Tabs but these were converted to spaces, so I double checked, and also with "," as a separator, and the problems is awk:

join -t $'\t' -1 1 -2 1 <(sed 's/\t/_/' a.bed | sort -t $'\t' -k1,1 )  <(sed 's/\t/_/' a.bed | sort -t $'\t' -k1,1 ) 
chr1_10 25  feat2   .   +   25  feat2   .   +
chr1_100    20  feat1   .   +   20  feat1   .   +
chr1_12 20  feat3   .   +   20  feat3   .   +

In this example, the second field is always similar to the sixth and thus awk -F '\t' '($2!=$6)' fails. Anyway, your idea gave me something to work with. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6