Clustering Of Protein Sequences
7
3
Entering edit mode
14.3 years ago
Prasobh ▴ 30

hi,
How can i do heirarchical clustering of protein sequences which are in fasta format. Is there any software available ?thanks in advance.

clustering • 9.8k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
6
Entering edit mode
14.3 years ago
Paulo Nuin ★ 3.7k

Cd-HIT might be the program you're looking for. But we would need more information in your question to correctly assess it.

ADD COMMENT
4
Entering edit mode
14.3 years ago
Rm 8.3k
  1. if you want to cluster based on sequence identity then use CD-HIT

  2. Based on phylogeny: Align Fasta sequences using clustal or any other Alignment software. then use any phylogenitic packaeges like MEGA or phylip etc. sequence-alignment-distance matrix (protdist)-then use NJ or parsimony or ML then generate phylogenetic tree and look for sequences falling in to different brances (they are clustered together)

  3. Use the alignment and get the similarity or distance matrix for each pairs and use that as an input to any clustering softwares like Eisen's Cluster3 or Tm4 MEV. ( some tweeking is required to convert the data into appropriate format)

ADD COMMENT
1
Entering edit mode
14.3 years ago
Casbon ★ 3.3k

Use a spectral method instead?

This is a shameless self plug since I did this a few years back.

ADD COMMENT
1
Entering edit mode
14.3 years ago
Charles ▴ 20

uClust was also released recently: http://drive5.com/usearch/usearch3.0.html and the publication

ADD COMMENT
0
Entering edit mode
14.3 years ago

Exactly as Paulo writes - more info is needed. Do you want to cluster in a phylogenetic manner? By presence/absence of functional (Pfam) domains? By genetic effect(s) of knock-out/over-expression? Asking a specific question will get you a specific answer - and the help you need to move your research forward.

ADD COMMENT
0
Entering edit mode
14.3 years ago
Niek De Klein ★ 2.6k

Eisen has some nice clustering prorgams: http://rana.lbl.gov/EisenSoftware.htm

ADD COMMENT
0
Entering edit mode
13.4 years ago
Christian ★ 3.1k

I got very good results with MC-UPGMA: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0013409

It is also very efficient on large data sets.

You can download the program here: http://www.protonet.cs.huji.ac.il/mcupgma/

ADD COMMENT

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6