Hi,
I have 320 protein motifs which I would like to cluster on similarity. I have constructed a distance matrix for these motifs and used it to construct a tree. The tree contains the clusters and I want to extract them. Now my problem is breaking the tree into clusters.
Does anyone know of software which can do this? I could write a newick parser which extracts the clusters and will do so if there are no alternatives.
Thanks,
Kevin
If your end goal is to get clusters, I would not go via a tree. Instead, I would use the all-against-all distance matrix that you have already created as input to, for example, Markov Linkage Clustering (MCL) that would directly give you clusters. This is easier and gives better results in my experience, although I should mention that I have not tested it for the exact problem that you work on.
I think you can get help for it from this paper.
In this paper, author were calculated from the distance matrix using the Fitch program from the Phylip package, and constructed phylogenetic trees.
Already nice answer by Lars, here is my thoughts on using phylip tool for creating a cluster.
Clustering using Phylip is an classic example of a bioinformatics hack, I have done this myself for a set of motifs before and the result was quite intuitive. You can start with an alignment of your motifs then route it through the phylip workflow.
output of distance file can be used as input in any tree visualization tools. Dendrogram with bootstrap will be ideal option to see the significant nodes or clusters. This tree can be used to visualize and analyze clusters using the concept of phylogenetic tere. Here you will have a distinct advantage, Phylip is aware of the protein sequence context and the you are getting the final tree after rigorous bootstrapping, which show the significance of you nodes (or clusters).
That worked well. Thanks for your help.
Good to hear it worked :-)