I am doing a multiple alignment with some proteins that have a high identity with my protein that I use as a query subject. When I see the alignment with Genedoc it looks like the proteins are different (they have a longer portion of amminoacids while my protein doesn't have), and as a result I see that my protein has a lot of indel. My question is: if the proteins that I used have a high identity (>90%) with my initial protein, then why do I see them so different if I do the multiple alignment on Genedoc?
If your query protein is different to all the others, then of course you will get indels. If the indels that are being added are nonsensical then you may get better results by manipulating the scoring parameters, but I would suggest just trying a different aligner.
It sounds like your query is just something of an outlier relative to the rest of the family though?
Thank you for the reply. My question was: if my query protein is different to all the others as I see in the alignment, then why would the protein be so significant with all other proteins when I see them in Blastp? Shouldn't they have a lower score? I saw the % identity of every protein with my query and all of them have % identity > 90.
You haven't told us what the score is, so how can we judge whether it might be expected to be lower (or higher in the case of a less good match if we're talking about E-values)?
Anything which is aligning with >90% identity should have a decent score, but as BLAST is a local aligner, you may find a small subsection of the protein aligns with >90% ID, but the score is poor if the rest of the protein isn't aligning well.
It would be helpful to share the alignment (either as an uploaded file or a screenshot) for others to scrutinise.
Your protein can have 200 residues and its matches could have 300 or 400 residues. In the region they share the identity could be 95% which is what BLAST will report. That still leaves some extra sequence parts they do not share, which will show up as indels.
Most likely you will need to trim out the portion they do not share, either before or after the alignment. Tough to know exactly without more details, as percent identity from BLAST and your explanation are not descriptive enough.
Have you tried using an alternate alignment program to see what you get? MEGA ( https://www.megasoftware.net/ ) can be one option as well as Jalview ( https://www.jalview.org/ ).
I'm not sure I understand the problem?
If your query protein is different to all the others, then of course you will get indels. If the indels that are being added are nonsensical then you may get better results by manipulating the scoring parameters, but I would suggest just trying a different aligner.
It sounds like your query is just something of an outlier relative to the rest of the family though?
Thank you for the reply. My question was: if my query protein is different to all the others as I see in the alignment, then why would the protein be so significant with all other proteins when I see them in Blastp? Shouldn't they have a lower score? I saw the % identity of every protein with my query and all of them have % identity > 90.
You haven't told us what the score is, so how can we judge whether it might be expected to be lower (or higher in the case of a less good match if we're talking about E-values)?
Anything which is aligning with >90% identity should have a decent score, but as BLAST is a local aligner, you may find a small subsection of the protein aligns with >90% ID, but the score is poor if the rest of the protein isn't aligning well.
It would be helpful to share the alignment (either as an uploaded file or a screenshot) for others to scrutinise.