Dear community, i am trying to build a tree with my representative OTU sequences for use with UniFrac.
align_seqs.py -i otus_min3.fa -m muscle -o muscle
The first word of each fasta header in the input file starts with "OTU" and is preserved in the alignment.
head muscle/otus_min3_aligned.fasta
">OTU82968084443698 GTGTCAGCAGCCGCGGTAATACGA----AGGATCCA-
This is used for
make_phylogeny.py -i otus_min3_aligned.fasta -t muscle -o tree
delivering a .tre file, which i want to use with
beta_diversity.py -i ../OTU_TABLE_min3.biom -o beta -m weighted_unifrac -t OTU.tre
The biom file contains the same read names as the alignment input ("OTUnnnnnnnn"). I am now getting this error:
ValueError: No valid samples/environments found. Check whether tree tips match otus/taxa present in samples/environments
Indeed, i found that the tree tips have been renamed as "seqnnnnn".
head OTU.tre
(('seq_2856':0.633333,('seq_2936':0.446667,'seq_5108':0.446667):0.186667):0.62124,((('seq_13435':0.953333,.....
However, the make_phylogeny.py documentation says: "The tips of the tree are the first word from the input sequences from the fasta file, e.g.: ‘>101 PC.481_71 RC:1..220’ is represented in the tree as ‘101’."
This wasnt the case here. Where did i mess up?