I'm trying to automate (and improve) a computational 'procedure' in my new lab (of traditional biologists) and I'm having trouble replicating a step.
At one point in an analysis they use the GUI version of MUSCLE to align a set of sequences and then generate a UPGMA tree. Then they save the pairwise distance matrix for downstream analysis. I've already automated the downstream and upstream processes but I'm having trouble with this step.
The MUSCLE command line doesn't have an option for returning the pairwise distances (only the final tree). Is there a way to get those distances out? Or is there a way to extract the distances from the tree (a simple googleing didn't reveal anything)? Or is there a way to calculate those distances myself (the muscle manual doesn't give much help explaining how it arrives at the distances)? Or any other suggestions? I really don't want to resort to some silly 'automated GUI interaction' just to get this output data.
Thanks,
Will
Looks like you beat me to it! Thanks for posting your solution!
None of those would work for this use-case ... I needed the matrix of distances between sequences not the 'alignment matrix'. None of the output methods include distances except the
-tree
. And that requires some conversion to get the values out.Yes, I understand that there is no distance matrix output for MUSCLE so my response was to suggest two ways to compute a distance matrix. Thanks for your Python solution!