I want to use a guide tree I got in NCBI common taxonomy tree to use with my MSA in clustalw to produce an alternative arrangement of the MSA based on the NCBI guide tree.
The guide tree from NCBI I got looks like this,
(
'synthetic construct':4,
'unclassified sequences':4,
'Paramecium bursaria Chlorella virus 1':4,
(
(
(
'Picomonas judraskeda':4,
'Palpitomonas bilix':4,
'Metromonas simplex':4
)'unclassified eukaryotes':4,
(
'Rhizomastix libera':4,
(
'Stygamoeba regulata':4,
..
..
..
I tried using it but I get the an error "ERROR: tree". with no other information as to what maybe wrong.
My command I used is
clustalw2 -INFILE=names.fasta -USETREE=ncbi_guide_tree.phy -OUTFILE=unique_euka.fasta -OUTPUT=FASTA
Does anyone please have any suggestions as what my be wrong?
Yes I checked the names, they are the same. The guide tree contains the same names as in the input sequence file. I used their taxa IDs to get the guide tree off NCBI.
OK. One thing I noticed in the sequence names of your tree file, you have spaces in the names, which clustalw2 doesn't like. http://www.ebi.ac.uk/Tools/msa/clustalw2/help/faq.html#11
When reading the input, clustalw2 will likely interpret "unclassified eukaryotes" and "unclassified sequences" as a duplicate entry "unclassified". If your sequence names do exactly match in both files, then I'd have to recommend you change the names of each sequence (in both files) so they do not have invisible characters, e.g. "Picomonas judraskeda" > "Picomonas_judraskeda".
Incidentally, there's also a practical length of 30 characters for sequence names that you may want to consider: http://www.ebi.ac.uk/Tools/msa/clustalw2/help/faq.html#18. The clustalw output may truncate your sequence names, so that (for example): "Paramecium bursaria Chlorella virus 1" just becomes "Paramecium bursaria Chlorella", which may or may not have an impact on your output.