I have a alignment of >300 homologous sequences from different samples. All are the same length at around 15,000 bases. I don't expect any to be identical but wish to cluster them and identify the (motifs or individual bases) which distinguish or are more characteristic of each cluster than from any of the others.
I realize I could do some variation on hierarchical clustering but am curious if anyone has any advice on how to proceed.
Any comments appreciated.