I have several thousand sequences, which have been aligned and used to generate a Newick phylogeny tree. I want to find a smaller subset of sequences, X, such that the remaining sequences are still distributed uniformly across the phylogenic space.
So far I'm thinking about determining the number of branches at some distance from the root, until the number of branches is close to X. Then for each of these branches, I choose one tip sequence at random and delete the rest.
I am working with Python, but any pointers would be helpful. Thanks!
Did you find a solution? Would be very interested to hear how you achieved this!