hi,
How can i do heirarchical clustering of protein sequences which are in fasta format. Is there any software available ?thanks in advance.
hi,
How can i do heirarchical clustering of protein sequences which are in fasta format. Is there any software available ?thanks in advance.
Cd-HIT might be the program you're looking for. But we would need more information in your question to correctly assess it.
if you want to cluster based on sequence identity then use CD-HIT
Based on phylogeny: Align Fasta sequences using clustal or any other Alignment software. then use any phylogenitic packaeges like MEGA or phylip etc. sequence-alignment-distance matrix (protdist)-then use NJ or parsimony or ML then generate phylogenetic tree and look for sequences falling in to different brances (they are clustered together)
Use the alignment and get the similarity or distance matrix for each pairs and use that as an input to any clustering softwares like Eisen's Cluster3 or Tm4 MEV. ( some tweeking is required to convert the data into appropriate format)
Use a spectral method instead?
This is a shameless self plug since I did this a few years back.
uClust was also released recently: http://drive5.com/usearch/usearch3.0.html and the publication
Exactly as Paulo writes - more info is needed. Do you want to cluster in a phylogenetic manner? By presence/absence of functional (Pfam) domains? By genetic effect(s) of knock-out/over-expression? Asking a specific question will get you a specific answer - and the help you need to move your research forward.
Eisen has some nice clustering prorgams: http://rana.lbl.gov/EisenSoftware.htm
I got very good results with MC-UPGMA: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0013409
It is also very efficient on large data sets.
You can download the program here: http://www.protonet.cs.huji.ac.il/mcupgma/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hierarchial Clustering