Protein sequence to Nucleotide sequence
2
0
Entering edit mode
3.1 years ago
mittu1602 ▴ 200

Hello All,

I have file1 with protein sequence and another file with its respective decoded nucl codon sequence, is there any one liner which looks for aa single letter in file2 - change the protein sequence to the nucleotide sequence and save it as a file 3.

For eg:

File1.fasta

MFLILLISLPTAFAVIGDLKCTTVSINDVDTGVPSIST.....................

File2.txt (tab file)

M ATG

F TTT

.. ...

Expected output

file3.fasta

ATGTTTTGATACTTT....................

linux awk codon sed perl • 966 views
ADD COMMENT
2
Entering edit mode
3.1 years ago

backtranseq http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/backtranseq.html

backtranseq reads a protein sequence and writes the nucleic acid sequence it is most likely to have come from.

ADD COMMENT
0
Entering edit mode
3.1 years ago
5heikki 11k

This is not very efficient, but it does work (notice, these files do not have headers)..

cat file.aa
MFLILLI

cat file.map
M       AAA
F       TTT
L       CCC
I       GGG

for x in $(fold -w 1 file.aa); do awk -v x="$x" 'BEGIN{FS="\t"}{if($1==x){printf $2}}' file.map; done
AAATTTCCCGGGCCCCCCGGG

But surely you are aware that it's impossible to go from aa sequence back to the original nt sequence? E.g. six different codons encode Leucine..

ADD COMMENT

Login before adding your answer.

Traffic: 2625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6