Given many genomic coordinates, how do I find those that have matching 5' or 3' ends, regardless of length. for example:
chr1 10 20 feat1 . +
chr1 10 25 feat2 . +
chr1 12 20 feat3 . +
All the features overlap, but I am only interested in feat1
and feat2
which share the same start position, or in this case the 5'. Should I be interested in the 3' end, than, feat1
and feat3
(plus the coordinates).
I have looked at the options of bedtools
but couldn't find out how to do this. Is there a tool out there that allows me to do this, if so which, or go I have to cobble something together? I guess awk
is always an option.
Nice idea, but the the join does not seem to be working:
The full command returns nothing.
Works on my machine. what's your delimiter ?
Tabs but these were converted to spaces, so I double checked, and also with "," as a separator, and the problems is
awk
:In this example, the second field is always similar to the sixth and thus
awk -F '\t' '($2!=$6)'
fails. Anyway, your idea gave me something to work with. Thanks.