Question

How To Align Unequal Length Of Protein Sequences?

4

Entering edit mode

12.7 years ago

Sagar Nikam ▴ 160

i have 4500 protein sequences with very very less similarity,having length at least 40 & some may be maximum about 300/500 Is there any effect on quality of phylogeny tree, constructed by using Multiple sequence alignment's(MSA) output.

what should i do if i want to align all protein seq without any loss of useful information? should i go for further data curation?if yes,then how/

protein sequence multiple • 4.0k views

ADD COMMENT • link updated 12.7 years ago by Larry_Parnell 16k • written 12.7 years ago by Sagar Nikam ▴ 160

score 3 · Answer 1 · 2012-03-28

The quality of the MSA is highly influential on the output of the phylogenetic analysis.

In my opinion, it is more informative to build phylogenetic trees based not on 4500 sequences, but based on gene or protein families. This makes more sense biologically and reduces the complexity of the problems of sequence diversity and length. If two protein families show some plausible degree of shared evolutionary history, then you can attempt to add their respective trees after separate MSAs and trees are built.