Question

issue with fasttree package

0

Entering edit mode

7.0 years ago

virus_n00b • 0

I need to derive the phylogenetic tree for a group of sequences. I am using MUSCLE to perform sequence alignment. The output of MUSCLE is a sequence aligned file which has the following header

MUSCLE (3.8) multiple sequence alignment

Now to build the tree, I am using FastTree. I pass the following command

FastTree PATH_TO_ALIGNMENT_FILE > PATH_TO_TREE_FILE

which results in the below error

FastTree Version 2.1.10 SSE3
Alignment: ../muscle/b1300dc46e02615c56cd762b141547c0.muscle
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
Error parsing header line:MUSCLE (3.8) multiple sequence alignment

Is there a workaround for this error?

Input file : https://drive.google.com/open?id=1-13zaYJeL0XWNMHHKUZG62b4bCh5qZ4I

File generated by MUSCLE : https://drive.google.com/open?id=13Id1UqBtaLGT7HdK8_tb-X5DuFPchy-y

fasttree muscle sequence • 3.9k views

ADD COMMENT • link 7.0 years ago by virus_n00b • 0

0

Entering edit mode

What is the format of the MUSCLE alignment file? Is it fasta?

ADD REPLY • link 7.0 years ago by Sej Modha 5.3k

0

Entering edit mode

Yes. It is fasta. I have passed the following command to MUSCLE

muscle -in b1300dc46e02615c56cd762b141547c0.fasta -out b1300dc46e02615c56cd762b141547c0.muscle

I have also added the Input and Sequence file generated by MUSCLE.

ADD REPLY • link 7.0 years ago by virus_n00b • 0

0

Entering edit mode

Input file you provided here is in FASTA (https://en.wikipedia.org/wiki/FASTA_format) format but contains only Ns in the sequences, the output file is not in the FASTA format. Are you sure you want to run FastTree on such data that only contain Ns?

ADD REPLY • link 7.0 years ago by Sej Modha 5.3k

0

Entering edit mode

It is a collection of M's and N's with very few M's. In case the output file is not in fasta format, then the issue must be from MUSCLE side. I am sure that my input file is correct because the same file is used to generate the alignment and phylogenetic tree using the web service (https://www.ebi.ac.uk/Tools/msa/muscle/). But they have a cap on the number of sequences and the size of file and hence I need to run these on my local system.

ADD REPLY • link 7.0 years ago by virus_n00b • 0

0

Entering edit mode

I am unable to replicate the same error at my end with your data as the output file generated for me is in FASTA format. It is worth generating output file in the clustalw format and converting them using online tools such as https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/

ADD REPLY • link 7.0 years ago by Sej Modha 5.3k

0

Entering edit mode

Since you are unable to replicate the error, I assume there is something wrong on my side. Can you try the following at your end...

Use muscle to convert the Input file using

muscle -in b1300dc46e02615c56cd762b141547c0.fasta -out b1300dc46e02615c56cd762b141547c0.muscle -clwstrict -maxiters 2
Use Fasttree on the b1300dc46e02615c56cd762b141547c0.muscle file generated.

FastTree b1300dc46e02615c56cd762b141547c0.muscle > b1300dc46e02615c56cd762b141547c0.tree

ADD REPLY • link 7.0 years ago by virus_n00b • 0