I am trying to find paralogous genes in a model organism (Arabidopsis). To do that am using blast with the sequence for its own database. The first hit I get is the protein sequence aligned with it self (using peptide - amino acid sequence) which is mostly 100%. I am trying to understand which genes are highly homologous to each other and I come across similar examples where there are multiple alignments with the sequence of a single protein for example
Protein1 Protein1 Score410
Protein1 Protein2A Score300
Protein1 Protein2B Score210
Protein1 Protein3 Score280
Protein1 Protein4 Score250
The alignment to itself is an 'ofcourse' here but is it the norm to consider the best alignment from all other alignments like Protein1 Protein2A in this case and ignore Protein1 Protein2B (due to overlap between these alignments) Or are there some calculations/options to take that into account.
And then do I consider Protein1 Protein3, Protein1 Protein4 as close homologs of Protein 1 - are there some commonly used thresholds for such blast results?
Thanks!
Take a look at this question. Most of it is applicable to yours as well.
+1 @Michael Schubert