I have a file with sequences in first column and start coordinate for these sequences in the 3rd column. I want to extract all rows where the starting loci of sequences vary by 5 nt of each other. Following is an example
Sequence Column header_xxx Start coordinate Column header_yyy
ABC 500
XYZ 502
DEF 12050
PQR 12055
abc 400
mno 456
In the above example the script I am trying to come up with should be able to extract rows with seq ABC, XYZ, DEF and PQR, but not abc and mno.
Any solution preferably in bash or perl will be very helpful.
Thanks and regards
Thanks a lot for the solution Alex !!
If this answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.