Entering edit mode
5.9 years ago
bioguy
▴
50
Any ideas on how to extract phylogenetic tree distances/dissimilarity for a massive group of ncbi taxonomic ids, like 50-100K? Ideally I'd be able to generate or download a file of the form:
Taxa1,Taxa2,Distance
Taxa1,Taxa3,Distance
......
This was the closest thing I could find (https://www.biostars.org/p/312148/), but it seems to require doing it in R, and I'm pretty sure R can't handle matrices of the scale I'm thinking.
I think you can use the ETE3 toolkit to generate a tree representation of NCBI taxid's (you can give it a hierarchy level I think (e.g. primates) and get all the taxa below it) and calculate all-vs-all inter-tip distances (I have some code which can do this last bit, but no idea how it'll scale).
ok cool, this helps a lot, thank you. Giving it a shot now, we'll see how it works out...