Apologies for updating an old topic, but this page is one of the first results when searching Google for "MAFFT distout". I'd like to share some information on the distout feature that comes directly from Dr. Katoh, author of MAFFT:
The --distout flag is just for my personal use and there is no
document. I think it's better and safer for you to use your code to
assess the similarity or distance of sequences from a given alignment.
However, I think I can explain this option. Please check if it'll be
useful for your purpose or not:
The --distout option outputs distances that are used for building a
guide tree. Note that the distances are computed before building an
MSA, not computed from the MSA. The distances are converted from
pairwise scores with this equation,
Distance(i,j) = 1 - Score(i,j) / min( Score(i,i), Score(j,j) )
as explained in our old paper, Katoh et al 2002. Score() can be
computed by various ways (without MSA). In Katoh2002, Score(i,j) was
the number of 6mers that are shared by sequences i and j. If the
--localpair flag is set, then Score(i,j) is the pairwise alignment score between sequences i and j. The pairwise alignment is not always
the same as the resulting MSA even if the number of the input
sequences is two.
Thanks. Worked for me. Now I just have to patch the Biopython wrapper to let me use this option.
+1 for taking the time to read the source!
Hi, I ran a mafft job to align a reference sequence to an alignment. I received an error: Loading 'hat2n' (aligned sequences - new sequences) ... 115628 != 11562 hat2 is wrong. Have you ever encountered similar error. I am not sure what this means. Thanks