Clusters of max SNP distance
0
0
Entering edit mode
7.8 years ago

Hi,

I want to find all clusters of a max SNP distance of say 12 snps of 500 samples. I have a data matrix showing the SNP distances but need an algorithm to cluster them - something like hierarchical clustering with a termination at maximum distance of 12 but I'm not sure how to do this in eg R. Any ideas?

Thanks

R SNP • 2.6k views
ADD COMMENT
0
Entering edit mode

Could you please give an example of data and output you want to get. Also if you can explain the reason for the question we might be able to find the solution faster.

ADD REPLY
0
Entering edit mode

Sure: a matrix of snp distances between 4 samples eg.

0    
500 0      
34   4       0     
19   20      3     0

So i can obviously reconstruct the phylogeny using eg ML which would show me that there is a reconstructed snp distance of <12 between samples 3 and 2, and also 4 and 3. Essentially I want to define all clusters where the maximum distance between any cluster member and its nearest neighbour is 12. I could do this by simply looking at the ML tree but this becomes tedious with massive data sets. The reason for doing this is to look for evidence of transmission.

ADD REPLY
0
Entering edit mode

If I get it right, you want to cluster in binary space where distance <12 is considered equally "close" lets use 0 to show it and >=12 is "far" and we can assign 1 to such cases. Then you transform your matrix to

0
1 0
1 0 0
1 1 0 0

and you want to cluster it then? If so, you can use dist(x, method="binary") in R for distance measure (which is Jaccard), and then use the distance matrix object in a clustering algorithm like hclust. Otherwise you can start with binary clustering with coclusterBinary from https://cran.r-project.org/web/packages/blockcluster/vignettes/blockcluster_tutorial.pdf

ADD REPLY

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6