Entering edit mode
4.1 years ago
kbaitsi
•
0
I have used R to calculate a similarity matrix for 11 proteins (histones) from a fasta file. Then I need to turn the similarity matrix into a distance matrix in order to use it in hclust. I have used sim2dist and also dist with all methods (euclidean, maximum, manhattan, canberra, binary, minkowski). I have excluded the binary method but I am not sure which is the best way to calculate the distance from the rest of my options. Any thoughts?
There are a few common and generic ways of turning a similarity into a distance such as:
Thank you for your answer, sim2dist does what you wrote in the first bullet. I was just wondering if there is a preferable way when it comes to protein sequences or it doesn't matter?
What matters most is the choice of the original measure of similarity. It has to capture the notion of proximity/similarity that is relevant to the question you're trying to address. When converting you need to make sure that distribution properties that are important for the clustering are preserved.
I have used the pairwise alignment function and a blosum subtitution matrix. Thanks a lot for your time and answer.