Edit: A possible answer has been posted on SO. I need to verify it myself but it looks correct at the moment. The answer is here at: https://stackoverflow.com/a/46670077/5609349
How do I generate all possible Newick Tree permutations for a set of species given an outgroup?
For those who don't know what Newick tree format is, a good description is available at: https://en.wikipedia.org/wiki/Newick_format
I want to create all possible Newick Tree permutations for a set of species given an outgroup. The number of possible leaf nodes is either 4, 5, or 6.
Both "Soft" and "hard" polytomies are allowed. See: https://en.wikipedia.org/wiki/Polytomy#Soft_polytomies_vs._hard_polytomies https://biology.stackexchange.com/questions/23667/evidence-discussions-of-hard-polytomy
Shown below is the ideal output, with "E" set as the outgroup
Ideal Output:
((("A","B","C"),("D"),("E"));
((("A","B","D"),("C"),("E"));
((("A","C","D"),("B"),("E"));
((("B","C","D"),("A"),("E"));
((("A","B")("C","D"),("E"));
((("A","C")("B","D"),("E"));
((("B","C")("A","D"),("E"));
(("A","B","C","D"),("E"));
(((("A","B"),"C"),"D"),("E"));
However, any possible solutions that I've come with using itertools, specifically itertools.permutations, have come across the problem of equivalent output. The last idea I came up with involved the equivalent output that is shown below and I think a solution would definitely involve set.
Equivalent output:
((("C","B","A"),("D"),("E"));
((("C","A","B"),("D"),("E"));
((("A","C","B"),("D"),("E"));
Here is the start of my idea for a solution. However, I'm not really sure what to about this problem besides itertools and set for now.
import itertools
def Newick_Permutation_Generator(list_of_species, name_of_outgroup)
permutations_list =(list(itertools.permutations(["A","B","C","D","E"])))
for given_permutation in permutations_list:
process(given_permutation)
Newick_Permutation_Generator(["A","B","C","D","E"], "E")
Hello TheRealMrDoe!
It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/46626414
This is typically not recommended as it runs the risk of annoying people in both communities.
Yes, that is correct. However, I didn't receive many replies on my post on stackoverflow so I decided to post here. Also, I expected that more people would here would be familiar with Newick trees.
Before you get too involved you need to make sure what you're planning is practical, as you get a lot of bifurcating trees from surprisingly few leaf nodes (and if it's bifurcating + others then it's even worse). If your plan is to do some compute for goodness of fit etc on all of these, then you might need to think carefully about it! With just 10 leaves, you'll be testing 34 million bifurcating trees...
That's true. I don't plan to do goodness of fit on all of these. I'm just looking to generate permutations of Newick trees.
For how many leaf nodes then, outside of this example dataset?
4 and 6 nodes outside of this example dataset.
Are polytomies allowed?
No, polytomies are not allowed.
The "ideal output" in your question includes polytomies. Take a look at this answer: https://stackoverflow.com/questions/17242117/what-python-code-generates-all-possible-groupings-trees-for-binary-operators.
I think a.zielezinski is also getting at the same problem that I was pushing towards.. in your idealised output you're allowing polytomies. It sounds like you're just interested in generating all possible trees as strings based purely on randomness, so the distinction between hard and soft polytomies doesn't matter if there is no actual data to base this on?
If we say that all polytomies are not allowed, you'll solve a few of your equivalencies but generate new ones.
e.g.
Should resolve in to 3 distinct trees per polytomy (if my quick mental sums are right). This will give you a lot more trees potentially, but you'll also have to deal with the equivalency of something like:
If you look at the link provided, based on your current data, you couldn't say if you had a hard polytomy or not, since these are just permutations, not based on actual phylogenetic information (unless I've missed something).
I see. Thank you for the clarification. Polytomies will be allowed in my output then. Do you have any suggestions for methods for generating the trees?
Does your code output anything at the moment? How close is it to what you wanted?
Nevermind, it looks like you got a fantastic answer over on SO.
For anyone else who wants to see the answer, check it out here at: https://stackoverflow.com/a/46670077/5609349
Note that, for some reason, this solution does not seem to generate all possible results when polytomies are allowed. I found more topologies with this approach: https://stackoverflow.com/a/46707857/1878788]
That's right. Let me clarify. "Soft" polytomies are allowed but "hard" polytomies are not allowed.
I'm still a little confused on "soft" and "hard polytomies. Are both types present in the "ideal output" of my question? I think I only had "soft" polytomies in my "ideal output."
See: https://en.wikipedia.org/wiki/Polytomy#Soft_polytomies_vs._hard_polytomies https://biology.stackexchange.com/questions/23667/evidence-discussions-of-hard-polytomy
Also, I looked at this answer: https://stackoverflow.com/questions/17242117/what-python-code-generates-all-possible-groupings-trees-for-binary-operators
It's only useful for numbers and strings, not lists.