I've done Blast alignment for one protein against the whole genome. The results are confusing because there are identical alignments with different results.
One of the examples:
https://ibb.co/kZ6T9Q
I've used Biolinux and Blast+ 2.6.1
Do you have any ideas why that have happened?
That's standard output + Query length, Subject length, Query Coverage Per Subject.
Collum's names - Query |Subject| % of identity| Align.Len| Number of mismatches| Number of gap opening| Start of alignment in query| End...in query| Start..in subject| End..in subject| E-value| Bit score| Query length| Subject length| Query Coverage Per Subject (for all HSPs)
As Ahill says, you get multiple hits to the same thing because there are 2 high scoring subsections (maybe active sites or conserved domains) within the protein. This is because it's a local alignment tool, so it will always find the highest scoring continuous stretches of a sequence.
Part of the output is the hit start and end positions, as well as the query start and end positions. This tells you which stretches of your query sequences are matching to which stretches of the resulting matched sequence.
BLAST is a local alignment method - it can and will give multiple alignments between a single query and target pair, depending on how run. From quick look - I'd guess those are two different sub-sequences of your query (SCO2792) aligning to different sub-sequences of the target (SCO0697), with high alignment scores. The 2 local alignments you show between the query and target have different start and end coordinates, lengths, expectation scores, etc. that are shown in your table. Use the column headers you list above in the comments to read off the start and end coordinates of each of the two alignments in the query and target.
Can you explain what do you mean by "% identity between different alignments"? If you are talking about an example, they are 46 and 53.
And as I understood it couldn't be gene duplication, because there is one sequence for the query and one for the subject. I've checked files for errors and they seem to be good.
P.s I mean input files with sequences - there is only 1 sequence for 1 protein.
Can you give an example of the query sequence?
I hope I've understood you :
Is the protein in question SCO2792? And what genome are you using?
What do those columns mean? Is this one of the standard blast output formats?
That's standard output + Query length, Subject length, Query Coverage Per Subject. Collum's names - Query |Subject| % of identity| Align.Len| Number of mismatches| Number of gap opening| Start of alignment in query| End...in query| Start..in subject| End..in subject| E-value| Bit score| Query length| Subject length| Query Coverage Per Subject (for all HSPs)
As Ahill says, you get multiple hits to the same thing because there are 2 high scoring subsections (maybe active sites or conserved domains) within the protein. This is because it's a local alignment tool, so it will always find the highest scoring continuous stretches of a sequence.
Part of the output is the hit start and end positions, as well as the query start and end positions. This tells you which stretches of your query sequences are matching to which stretches of the resulting matched sequence.