Pal2nal translation of large multi-fasta files produces a codon translated file where some sequences half length of the average.
0
0
Entering edit mode
17 months ago
rijan_dhakal ▴ 10

I did sequence alignment of a large peptide multi-fasta (n= 4991 sequences). The peptide alignment has sequences with the same length and pal2nal went through just fine... except some of the codon sequences are at half length. If average is X then some sequences are X/2. This is choking IQ-Tree.

I have tried both MUSCLE super5 and MAFFT. The error remain the same (i.e. MUSCLE or MAFFT both lead to some sequences having half of average length) except for different average lengths and average half length in MUSCLE and MAFFT codon sequences. I have pulled out and played with the sequences causing the issue and they seem to be in frame.

Example of peptide sequence not causing an issue:

RKVEAFLLFKEMGERGCQPNVHTYTVLIDSFCKERNLDDARKLFDDMFKKGLVPSVVTYNALIDGYCKEGMTEAALEILGMMESKKCNPNARTYNELICGFCKAK

corresponding cds :

AGGAAAGTGGAAGCTTTTCTACTTTTTAAAGAAATGGGTGAAAGAGGTTGTCAGCCTAATGTTCATACATACACTGTGCTTATTGATTCCTTCTGTAAGGAAAGGAATCTTGATGATGCCAGGAAATTGTTTGATGACATGTTTAAGAAAGGTTTGGTTCCCAGTGTGGTCACTTATAATGCTTTAATTGATGGGTATTGTAAAGAGGGAATGACTGAAGCTGCATTAGAAATTTTAGGTATGATGGAATCAAAGAAATGCAACCCTAATGCTCGGACCTACAATGAATTGATCTGTGGATTTTGTAAAGCTAAA

Example of peptide causing issue:

GLCKGGRLNDAWEIFQYLLAKGYQLNVHTYNAMVHGFCKEGLLDEAISLLYKMEENGCVPNSVTFNVVL

corresponding cds

GGTTTGTGCAAAGGTGGTAGATTAAATGATGCGTGGGAGATTTTTCAGTATCTTTTAGCGAAAGGTTATCAACTAAATGTCCATACATATAATGCGATGGTTCATGGTTTTTGCAAAGAAGGTTTGCTTGATGAAGCAATCTCCCTGCTTTATAAAATGGAAGAGAATGGTTGTGTCCCTAATTCTGTAACTTTTAATGTAGTCCTT

Any idea what might be going on?

translation codon phylogenetics • 737 views
ADD COMMENT

Login before adding your answer.

Traffic: 1617 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6