I want to cluster DNA sequences using oligo-nucleotide frequency vectors. Are there stand-alone implementations of k-means programs available for the same.
I want to cluster DNA sequences using oligo-nucleotide frequency vectors. Are there stand-alone implementations of k-means programs available for the same.
I gave an answer to a similar question here using R code.
If you replace hclust
with kmeans
then you already got there and R becomes a 'stand-alone solution' with a little script.
Try ?kmeans
to see the available options. For very large datasets you can also try the Kmeans implementation in the amap package.
Not sure how different its is from k-means, but cd-hit
is the one I use for clustering protein seqs - you can also use it for nucleotide seqs. It's an incremental clustering algorithm and is pretty fast.
site - http://www.bioinformatics.org/cd-hit/
user's guide - http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Michael. Will try the same and let you know.