Indentical proteins but different Blast results
2
0
Entering edit mode
7.2 years ago

Hi, everyone

I've done Blast alignment for one protein against the whole genome. The results are confusing because there are identical alignments with different results. One of the examples: https://ibb.co/kZ6T9Q I've used Biolinux and Blast+ 2.6.1 Do you have any ideas why that have happened?

Blast alignment • 1.7k views
ADD COMMENT
0
Entering edit mode

Can you give an example of the query sequence?

ADD REPLY
0
Entering edit mode

I hope I've understood you :

SCO2792 MSHDSTAAPEAAARKLSGRRRKEIVAVLLFSGGPIFESSIPLSVFGIDRQDAGVPRYRLL VCAGEDGPLRTTGGLELTAPQGLEAISRAGTVVVPAWRSITSPPPEEALDALRRAHEEGA RIVGLCTGAFVLAAAGLLDGRPATTHWMYAPTLAKRYPSVHVDPRELFVDDGDVLTSAGT AAGIDLCLHIVRTDHGNEAAGALARRLVVPPRRSGGQERYLDRSLPEEIGADPLAEVVAW ALEHLHEQFDVETLAARAYMSRRTFDRRFRSLTGSAPLQWLITQRVLQAQRLLETSDYSV DEVAGRCGFRSPVALRGHFRRQLGSSPAAYRAAYRARRPQGDRQPDPDTAAAGATRPLPP SDPPASLAPENAVPFQTRRTATPMPAGAASVPGQRSAP*

ADD REPLY
0
Entering edit mode

Is the protein in question SCO2792? And what genome are you using?

ADD REPLY
0
Entering edit mode

What do those columns mean? Is this one of the standard blast output formats?

ADD REPLY
0
Entering edit mode

That's standard output + Query length, Subject length, Query Coverage Per Subject. Collum's names - Query |Subject| % of identity| Align.Len| Number of mismatches| Number of gap opening| Start of alignment in query| End...in query| Start..in subject| End..in subject| E-value| Bit score| Query length| Subject length| Query Coverage Per Subject (for all HSPs)

ADD REPLY
0
Entering edit mode

As Ahill says, you get multiple hits to the same thing because there are 2 high scoring subsections (maybe active sites or conserved domains) within the protein. This is because it's a local alignment tool, so it will always find the highest scoring continuous stretches of a sequence.

Part of the output is the hit start and end positions, as well as the query start and end positions. This tells you which stretches of your query sequences are matching to which stretches of the resulting matched sequence.

ADD REPLY
2
Entering edit mode
7.2 years ago
Ahill ★ 2.0k

BLAST is a local alignment method - it can and will give multiple alignments between a single query and target pair, depending on how run. From quick look - I'd guess those are two different sub-sequences of your query (SCO2792) aligning to different sub-sequences of the target (SCO0697), with high alignment scores. The 2 local alignments you show between the query and target have different start and end coordinates, lengths, expectation scores, etc. that are shown in your table. Use the column headers you list above in the comments to read off the start and end coordinates of each of the two alignments in the query and target.

ADD COMMENT
0
Entering edit mode
7.2 years ago
pfs ▴ 280

Any chance you are observing gene duplication? What is the % identity between different alignments?

ADD COMMENT
0
Entering edit mode

Can you explain what do you mean by "% identity between different alignments"? If you are talking about an example, they are 46 and 53. And as I understood it couldn't be gene duplication, because there is one sequence for the query and one for the subject. I've checked files for errors and they seem to be good.

P.s I mean input files with sequences - there is only 1 sequence for 1 protein.

ADD REPLY

Login before adding your answer.

Traffic: 1290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6