How to do clustering of bacteria genome based on hamming distance.
1
1
Entering edit mode
7.9 years ago
jeccy.J ▴ 60

Can anyone suggest me how to do clustering a set of bacterial genome based on their hamming or snp distance ?

clustering genome • 2.3k views
ADD COMMENT
0
Entering edit mode

More detail is really needed. What exactly is your problem? How to calculate Hamming distance or SNP for two genomes? Which clustering algorithm to use once you've calculated the Hamming distances? Must it be Hamming or SNP distance, or are you in fact looking for distance metrics better suited for the problem you are trying to solve? How closely related are the genomes you want to cluster?

ADD REPLY
0
Entering edit mode

just get a matrix of distances MxN and use simple ward clustering or you could even try MDS. Both done in R ward clustering with manhattan distance for example:

pvclust(data = t(mydata),method.hclust = "ward.D",method.dist = "manhattan",nboot = 10000)

additionally you will get p-value for each clade as the number of replicated clusters

ADD REPLY
0
Entering edit mode
7.9 years ago
Sej Modha 5.3k

cd-hit can be used for clustering.

ADD COMMENT
0
Entering edit mode

It would be pointless to apply cd-hit to complete bacterial genome sequences (unless they were very similar sharing the same exact gene order and stuff). Perhaps a better strategy would be to build a distance matrix with e.g. all-vs-all MUMmer. Counting shared k-mers could also result in a relatively representative distance matrix..

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6