Entering edit mode
7.0 years ago
virus_n00b
•
0
I need to derive the phylogenetic tree for a group of sequences. I am using MUSCLE to perform sequence alignment. The output of MUSCLE is a sequence aligned file which has the following header
MUSCLE (3.8) multiple sequence alignment
Now to build the tree, I am using FastTree. I pass the following command
FastTree PATH_TO_ALIGNMENT_FILE > PATH_TO_TREE_FILE
which results in the below error
FastTree Version 2.1.10 SSE3
Alignment: ../muscle/b1300dc46e02615c56cd762b141547c0.muscle
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
Error parsing header line:MUSCLE (3.8) multiple sequence alignment
Is there a workaround for this error?
Input file : https://drive.google.com/open?id=1-13zaYJeL0XWNMHHKUZG62b4bCh5qZ4I
File generated by MUSCLE : https://drive.google.com/open?id=13Id1UqBtaLGT7HdK8_tb-X5DuFPchy-y
What is the format of the MUSCLE alignment file? Is it fasta?
Yes. It is fasta. I have passed the following command to MUSCLE
I have also added the Input and Sequence file generated by MUSCLE.
Input file you provided here is in FASTA (https://en.wikipedia.org/wiki/FASTA_format) format but contains only Ns in the sequences, the output file is not in the FASTA format. Are you sure you want to run FastTree on such data that only contain Ns?
It is a collection of M's and N's with very few M's. In case the output file is not in fasta format, then the issue must be from MUSCLE side. I am sure that my input file is correct because the same file is used to generate the alignment and phylogenetic tree using the web service (https://www.ebi.ac.uk/Tools/msa/muscle/). But they have a cap on the number of sequences and the size of file and hence I need to run these on my local system.
I am unable to replicate the same error at my end with your data as the output file generated for me is in FASTA format. It is worth generating output file in the clustalw format and converting them using online tools such as https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/
Since you are unable to replicate the error, I assume there is something wrong on my side. Can you try the following at your end...
Use muscle to convert the Input file using
muscle -in b1300dc46e02615c56cd762b141547c0.fasta -out b1300dc46e02615c56cd762b141547c0.muscle -clwstrict -maxiters 2
Use Fasttree on the b1300dc46e02615c56cd762b141547c0.muscle file generated.
FastTree b1300dc46e02615c56cd762b141547c0.muscle > b1300dc46e02615c56cd762b141547c0.tree