Entering edit mode
6.3 years ago
michau
▴
60
Hi, Is there any probabilistic measure of position dissimilarity from rest of alignment? ie. method to assess which residues are responsible for the phylogenetic and functional differences.
I have alignment of ATP synthases, and I noticed that Mycobacterium is highly divergent from other Bacteria or even other Actinobacteria. I need some measure to statistically discriminate which residues are responsible for this divergence.
Any thoughts?
Thanks in advance
I'm not sure if I exactly understand what you're after, but here goes:
You might be interested in calculating the Shannon Entropy per column of your Sequence alignment. High entropy positions will be your more divergent ones see for example: https://gist.github.com/jrjhealey/130d4efc6260dd76821edc8a41d45b6a.
You may need to take this further and do a dN/dS analysis or similar, since it probably won't be enough to just determine sites that are variable. You will need to demonstrate that they are causing meaningful selection (i.e. non synonymous).
Firstly: Thank you for answer
I was thinking to use dn/ds as a next step, but as far as I know it allows only for pairwise comparisons. I was looking for something more like measuring inside clade vs. outside clade (site specific) variation. Column-wide, like:
Or am I thinking bullshit? (I started my bioinformatical adventure recently → I'm still green as a lime and trying to learn) Nevertheless I will go with dn/ds as it will answer my question.
I don’t know enough about the stats to speak to whether a manova approach would work.
The only limitation that strikes me with that approach, however, is that objectively clustering ‘clades’ is a very difficult problem. It’s often much more obvious to a person than to a computer.
I would think there are approaches which will work in a non pairwise fashion. A quick bit of googling bought this up, which sounds like it might fit the bill?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887424/