Entering edit mode
4.7 years ago
chronotope
▴
10
Hey everyone! I have a 2k multiple sequence alignment tree that I would like to collapse at a certain sequence similarity level, e.g. 95%. Problem is, the collapsing function is based on the calculated tree scale rather than initial sequence similarity, but there obviously exists a relationship between the two. I can't figure out how to translate the scale of the tree into sequence similarity percentage. In other words, I am trying to find out at which branch length value do I want to collapse my tree nodes so that this collapsing corresponds to, say, 95% pairwise sequence similarity.
Thanks! A
How was the tree generated (what algorithm)? The algorithm used (e.g.
GTRGAMMA
,nj
etc) tells you how it is treating the similarity between the tree and the input alignment (loosely speaking).Actually it is the simplest MAFFT FFT-NS-i for large datasets
I tried to figure out what to do by looking at the paper but unfortunately I am not managing
To my knowledge MAFFT isn't capable of making trees. That is what they did the initial alignment with most likely. You need to find out how they created the tree specifically.
If this is something to do with a paper you need to tell us what paper - we can't guess this, and you haven't provided enough info yet for your question to be answerable.
Oh I am so sorry, rookie mistake: of course, I meant the MAFFT being the tool to align sequences. The tree was calculated with IQtree with an auto setting, and model SYM+I+G4 was chosen. I am going to read up on it now, but I thought I'd quickly post this first.
Cheers and sorry again for the misleading information, I have been a bit tired... : )