I did sequence alignment of a large peptide multi-fasta (n= 4991 sequences). The peptide alignment has sequences with the same length and pal2nal went through just fine... except some of the codon sequences are at half length. If average is X then some sequences are X/2. This is choking IQ-Tree.
I have tried both MUSCLE super5 and MAFFT. The error remain the same (i.e. MUSCLE or MAFFT both lead to some sequences having half of average length) except for different average lengths and average half length in MUSCLE and MAFFT codon sequences. I have pulled out and played with the sequences causing the issue and they seem to be in frame.
Example of peptide sequence not causing an issue:
RKVEAFLLFKEMGERGCQPNVHTYTVLIDSFCKERNLDDARKLFDDMFKKGLVPSVVTYNALIDGYCKEGMTEAALEILGMMESKKCNPNARTYNELICGFCKAK
corresponding cds :
AGGAAAGTGGAAGCTTTTCTACTTTTTAAAGAAATGGGTGAAAGAGGTTGTCAGCCTAATGTTCATACATACACTGTGCTTATTGATTCCTTCTGTAAGGAAAGGAATCTTGATGATGCCAGGAAATTGTTTGATGACATGTTTAAGAAAGGTTTGGTTCCCAGTGTGGTCACTTATAATGCTTTAATTGATGGGTATTGTAAAGAGGGAATGACTGAAGCTGCATTAGAAATTTTAGGTATGATGGAATCAAAGAAATGCAACCCTAATGCTCGGACCTACAATGAATTGATCTGTGGATTTTGTAAAGCTAAA
Example of peptide causing issue:
GLCKGGRLNDAWEIFQYLLAKGYQLNVHTYNAMVHGFCKEGLLDEAISLLYKMEENGCVPNSVTFNVVL
corresponding cds
GGTTTGTGCAAAGGTGGTAGATTAAATGATGCGTGGGAGATTTTTCAGTATCTTTTAGCGAAAGGTTATCAACTAAATGTCCATACATATAATGCGATGGTTCATGGTTTTTGCAAAGAAGGTTTGCTTGATGAAGCAATCTCCCTGCTTTATAAAATGGAAGAGAATGGTTGTGTCCCTAATTCTGTAACTTTTAATGTAGTCCTT
Any idea what might be going on?