Question

Clustering Dna Sequences Using K-Means

2

Entering edit mode

14.0 years ago

Monzoor ▴ 310

I want to cluster DNA sequences using oligo-nucleotide frequency vectors. Are there stand-alone implementations of k-means programs available for the same.

• 8.8k views

ADD COMMENT • link updated 14.0 years ago by Michael 55k • written 14.0 years ago by Monzoor ▴ 310

Ram · Answer 1 · 2010-12-16

3

Entering edit mode

14.0 years ago

Michael 55k

I gave an answer to a similar question here using R code.

If you replace hclust with kmeans then you already got there and R becomes a 'stand-alone solution' with a little script.

Try ?kmeans to see the available options. For very large datasets you can also try the Kmeans implementation in the amap package.

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.0 years ago by Michael 55k

0

Entering edit mode

Thanks Michael. Will try the same and let you know.

ADD REPLY • link 14.0 years ago by Monzoor ▴ 300

Ram · Answer 2 · 2010-12-16

0

Entering edit mode

14.0 years ago

Prateek ★ 1.0k

Not sure how different its is from k-means, but cd-hit is the one I use for clustering protein seqs - you can also use it for nucleotide seqs. It's an incremental clustering algorithm and is pretty fast.

site - http://www.bioinformatics.org/cd-hit/

user's guide - http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.0 years ago by Prateek ★ 1.0k

0

Entering edit mode

I think that is a different kind of 'clustering'

ADD REPLY • link 14.0 years ago by Michael 55k