Hello.
I am using mutliple alignments for some years and I always use the same protocol:
- I search homologous sequences in a database
- I clusterize the set to reduce redondancy
- And finaly, I use a multiple alignment program
The first step is, in my opinion, the most important, because the quality of my alignment, and also the quality of derived information, depends of this first selection of sequences. I am aware that the multiple alignment program is also important but the input is the basis.
So I want to select:
- A sufficient number of sequences to derive statistics
- A sufficient diversity to avoid redondancy bias in statistics
What is your protocol to create multiple alignment ?
Do you use only Blast or PsiBlast ?
Have you ever used a phylogenetic tree to select the sequences ?
I am also wondering if a selection based on "organisms screening" is relevant (I mean: to select sequences from different organisms to have a sample of "life diversity").
You have a lot o different question here! Are you looking for a generic protocol? Or have something specific in mind?
Thanks for your answer Jarretinha. In fact, I identified positions in a protein where I want to make mutations (change the sidechain experimentaly). But, before experiments, I want to know if these mutations will have an effect on the protein structure. By using a multiple alignment of sequences, I want to know if these positions are conserved or not. We made this supposition : if the position is not conserved, then a mutation can occurs without much consequences. (I know that what I just said is caricatural and not the "reality")
Conservation is only loosely related to function. Check this out.