Question

How To Select Protein Sequences For Multiple Alignment ?

4

Entering edit mode

14.1 years ago

Bilouweb ★ 1.1k

Hello.

I am using mutliple alignments for some years and I always use the same protocol:

I search homologous sequences in a database
I clusterize the set to reduce redondancy
And finaly, I use a multiple alignment program

The first step is, in my opinion, the most important, because the quality of my alignment, and also the quality of derived information, depends of this first selection of sequences. I am aware that the multiple alignment program is also important but the input is the basis.

So I want to select:

A sufficient number of sequences to derive statistics
A sufficient diversity to avoid redondancy bias in statistics

What is your protocol to create multiple alignment ?

Do you use only Blast or PsiBlast ?

Have you ever used a phylogenetic tree to select the sequences ?

I am also wondering if a selection based on "organisms screening" is relevant (I mean: to select sequences from different organisms to have a sample of "life diversity").

protein sequence multiple selection • 5.7k views

ADD COMMENT • link written 14.1 years ago by Bilouweb ★ 1.1k

0

Entering edit mode

You have a lot o different question here! Are you looking for a generic protocol? Or have something specific in mind?

ADD REPLY • link 14.1 years ago by Jarretinha 3.5k

0

Entering edit mode

Thanks for your answer Jarretinha. In fact, I identified positions in a protein where I want to make mutations (change the sidechain experimentaly). But, before experiments, I want to know if these mutations will have an effect on the protein structure. By using a multiple alignment of sequences, I want to know if these positions are conserved or not. We made this supposition : if the position is not conserved, then a mutation can occurs without much consequences. (I know that what I just said is caricatural and not the "reality")

ADD REPLY • link 14.1 years ago by Bilouweb ★ 1.1k

0

Entering edit mode

Conservation is only loosely related to function. Check this out.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 14.1 years ago by Jarretinha 3.5k

score 5 · Answer 1 · 2011-03-03

There are various types of protocols for MSA depending on you objectives. They can be very dissimilar. For instance, creating an alignment for protein homology modeling is quite different from creating one for Ka/Ks ratio calculation, phylogeny reconstruction, genome annotation. You must specify your needs before, just like any other statistics and/or modeling.

Of course, there are general advices:

Try to maximize information by removing parologous, adding as much variability as possible without introducing to much noise (mostly potential gaps). Indeed, phylogeny-oriented selection helps to address this point.
There a informational hierarchy in alignments. Structure based ones are good for studying architecture and conservation. Sequence based can be used to address selection, fine scale evolution.
Always check your alignment with everything in hand (including your eyes), no matter how big. I always use thing like PipeAlign
Of course, good anchoring is always good. So, whenever possible use profiles (psiblast, hmmer/pfam, infernal/rfam, etc.). But, remember! Each level has its own characteristics. Genes aren't the proteins they code and vice-versa. To each his own!!!

Anyway, what's your problem?

-- Edit --

Now that I now more about your problem I can suggest an approach. To test the effect of side chains changes, I normally proceed with site/residue directed mutagenesis in silico. The best programs I know about to do that are Andante and SDM from Tom Blundell group. Of course, you'll need the structure of your protein. Otherwise, a comparative/homology modeling step will be necessary.