Hello everyone,
My very first Biostar question here. I'm new in bioinformatics, so sorry if my questions are very naive...
I'm trying to align a huge number of homologous proteins to figure out some conserved aa residues as well as to find out some differences between them. All of them shows the same domain.
Some works used BLAST to do that, for example, they use a conserved protein, and used this protein and performed BLAST against all proteins in a protein list.
My first question, is this reliable? I really don't think so, although they really show a very high homology, some differences can cause a mismatch disturbing the local alignment, leading to wrong inferences, am I wrong?
My second question, since I don't think the strategy above is right I thought to use MAFFT to perform a MSA. I tried using all my sequences (around 30k), but it showed some wrong alignment, as expected. Therefore, I splitted my sequences in buckets of 100 sequences or less, and performed the MSA, is this a better approach in order to solve my problem?
If not, please can anybody help me?!?
Thank you for all the help! =]
Do you have a phylogenetic tree of the proteins? If you have it you can split the list according to clades and align the proteins inside each clade to get better results.