Shell script to reverse complement DNA, but with a twist (certain bases with a long name)
1
0
Entering edit mode
5.8 years ago
johan ▴ 120

I use this shell script (I'm not the author) to reverse complement DNA:

alias revcomp="echo 'import sys; print \"\".join([dict(zip(\"ACGTacgt\",\"TGCAtgca\"))[c] for c in sys.argv[1][::-1]])' | python -"

In the terminal it works perfect just to write revcomp ATGC which outputs GCAT. And to just complement, I changed it to

TGCAtgc\",\"TGCAtgc.

I'm working with some files, where for reasons I'm not in charge for, some bases written as a word. So let's say that one base is called (askJoeaboutthisone), how would I complement this sequence? I.e. so that AT(askJoeaboutthisone)GC returns CG(askJoeaboutthisone)TA.

In my script each letter is converted, but I wish that (askJoeaboutthisone) is treated as one letter.

Thanks!

shellscript reverse complement • 2.0k views
ADD COMMENT
0
Entering edit mode

What you're really doing is transliterating (and then reversing the sequence). The problem is, you have to iterate the string character by character.

Is it possible for your example to use mixed case? i.e. your long strings are all lower case, but your normal bases are all upper? It'll be tricky to delineate whats a string from within a string. Are the parenthesis actually part of your 'data' or is that just for the benefit of the post/explanation?

ADD REPLY
2
Entering edit mode
5.8 years ago
5heikki 11k
echo "AT(askJoeaboutthisone)GC" \
    | sed 's/(askJoeaboutthisone)/#/g' \
    | tr "[ATGCYRSWKMBDHVatgcyrswkmbdhv]" "[TACGRYSWMKVHDBtacgryswmkvhdb]" \
    | rev \
    | sed 's/#/(askJoeaboutthisone)/g'
GC(askJoeaboutthisone)AT

So, before the complementation (the tr line), we first replace (askJoeaboutthisone) with #, then after complementation and reversing (the rev line), we replace # with (askJoeaboutthisone). We do the temporary replacement so that the letters in the joestring don't get complemented by tr nor reversed by rev

Edit. You can save this as a script:

#!/bin/bash
if [ ! -z "$1" ]; then
    echo "$1" \
        | sed 's/(askJoeaboutthisone)/#/g' \
        | tr "[ATGCYRSWKMBDHVatgcyrswkmbdhv]" "[TACGRYSWMKVHDBtacgryswmkvhdb]" \
        | rev
        | 's/#/(askJoeaboutthisone)/g'
else
    echo ""
    echo "usage: rev_comp_seq_joe_twist DNASEQUENCE"
    echo ""
fi
ADD COMMENT

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6