Clustering Dna Sequences Using K-Means
2
2
Entering edit mode
14.0 years ago
Monzoor ▴ 310

I want to cluster DNA sequences using oligo-nucleotide frequency vectors. Are there stand-alone implementations of k-means programs available for the same.

• 8.8k views
ADD COMMENT
3
Entering edit mode
14.0 years ago
Michael 55k

I gave an answer to a similar question here using R code.

If you replace hclust with kmeans then you already got there and R becomes a 'stand-alone solution' with a little script.

Try ?kmeans to see the available options. For very large datasets you can also try the Kmeans implementation in the amap package.

ADD COMMENT
0
Entering edit mode

Thanks Michael. Will try the same and let you know.

ADD REPLY
0
Entering edit mode
14.0 years ago
Prateek ★ 1.0k

Not sure how different its is from k-means, but cd-hit is the one I use for clustering protein seqs - you can also use it for nucleotide seqs. It's an incremental clustering algorithm and is pretty fast.

site - http://www.bioinformatics.org/cd-hit/

user's guide - http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf

ADD COMMENT
0
Entering edit mode

I think that is a different kind of 'clustering'

ADD REPLY

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6