Problem with running phyml
1
0
Entering edit mode
2.1 years ago
Jimpix ▴ 10

Hi! I need to get final file in newick format. I am using the following commands:

muscle -in example.fasta -out example.msa.fasta

example.fasta is a file with 10 sequences with different length.

then:

muscle -in example.msa.fasta -out example.msa.fasta.phylip -refine -phyi

and in the last command:

phyml -i example.msa.fasta.phylip -m JC69 -o tlr

I have an error:

Check sequence 'M55008.1' length (expected length: 2122, observed length: 2123) [OTU 1].

The error is clear but why I get it? The input sequences must have the same lengths? How to fix it? Kindly help.

muscle phyml • 1.1k views
ADD COMMENT
3
Entering edit mode
2.1 years ago
Jesse ▴ 850

Do your sequence description lines have spaces in them (have extra text after what's treated as the sequence ID, in other words?)

This works fine for me with sequence IDs only, but when there are spaces I see that same error. I'm not very familiar with the PHYLIP format but it seems very cranky about whitespace.

One example.fasta that works:

>seq1
AACAAGGAAAGAATCGAACTCCAAAACTGACAAGAGCTGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTAAACTGGTGTACTATGCTGTCGTCTCT
>seq2
AACAATGAAAGAATCGAACTCCAAAACTGACAAGAGCTGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTGAACTGGTGTACTATGCTGTCGTCTCT
>seq3
AACAAGGAAAGAATCGAACTCCAAAACTGACAAGAGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTAAACTGGTGTACTATGCTGTCGTCTCT
>seq4
AACAATGAACGAATCGAACTCCAAAGCTGACAAGAGCTGTGACAGGTACTAGGTGCGGATCTTAGCACGCCTTGAACTGGTGTACTATGCTGTCATCTCT

Which gives this example.msa.fasta.phylip:

4 100
seq4       AACAATGAAC GAATCGAACT CCAAAGCTGA CAAGAGCTGT GACAGGTACT
seq2       AACAATGAAA GAATCGAACT CCAAAACTGA CAAGAGCTGT GACAGGGACT
seq1       AACAAGGAAA GAATCGAACT CCAAAACTGA CAAGAGCTGT GACAGGGACT
seq3       AACAAGGAAA GAATCGAACT CCAAAACTGA CAAGA---GT GACAGGGACT

AGGTGCGGAT CTTAGCACGC CTTGAACTGG TGTACTATGC TGTCATCTCT
AGGTCCGGAT CTTAGGACGC CTTGAACTGG TGTACTATGC TGTCGTCTCT
AGGTCCGGAT CTTAGGACGC CTTAAACTGG TGTACTATGC TGTCGTCTCT
AGGTCCGGAT CTTAGGACGC CTTAAACTGG TGTACTATGC TGTCGTCTCT

This example.fasta does not work:

>seq1
AACAAGGAAAGAATCGAACTCCAAAACTGACAAGAGCTGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTAAACTGGTGTACTATGCTGTCGTCTCT
>seq2
AACAATGAAAGAATCGAACTCCAAAACTGACAAGAGCTGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTGAACTGGTGTACTATGCTGTCGTCTCT
>seq3
AACAAGGAAAGAATCGAACTCCAAAACTGACAAGAGTGACAGGGACTAGGTCCGGATCTTAGGACGCCTTAAACTGGTGTACTATGCTGTCGTCTCT
>seq4 with extra stuff
AACAATGAACGAATCGAACTCCAAAGCTGACAAGAGCTGTGACAGGTACTAGGTGCGGATCTTAGCACGCCTTGAACTGGTGTACTATGCTGTCATCTCT

The example.msa.fasta.phylip then is:

4 100
seq4 with  AACAATGAAC GAATCGAACT CCAAAGCTGA CAAGAGCTGT GACAGGTACT
seq2       AACAATGAAA GAATCGAACT CCAAAACTGA CAAGAGCTGT GACAGGGACT
seq1       AACAAGGAAA GAATCGAACT CCAAAACTGA CAAGAGCTGT GACAGGGACT
seq3       AACAAGGAAA GAATCGAACT CCAAAACTGA CAAGA---GT GACAGGGACT

AGGTGCGGAT CTTAGCACGC CTTGAACTGG TGTACTATGC TGTCATCTCT
AGGTCCGGAT CTTAGGACGC CTTGAACTGG TGTACTATGC TGTCGTCTCT
AGGTCCGGAT CTTAGGACGC CTTAAACTGG TGTACTATGC TGTCGTCTCT
AGGTCCGGAT CTTAGGACGC CTTAAACTGG TGTACTATGC TGTCGTCTCT

phyml gives me a similar error:

Check sequence 'seq4' length (expected length: 100, observed length: 104) [OTU 1].

I think this fits what you're seeing, where in the original input you have M55008.1 (8 characters) a space, and then more text, and it gets truncated in the .phylip to just one extra character (ten total) that confuses phyml.

ADD COMMENT
0
Entering edit mode

Thanks very much! Work like it should

ADD REPLY

Login before adding your answer.

Traffic: 1626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6