s. start and s.end in tabular BLAST output are the wrong way around?
1
0
Entering edit mode
7.3 years ago
rmartson • 0

I'm carrying out commandline local BLAST+ using Biopython. The outputfmt = 7 so I get tabular format with comments. I use the BLAST results to extract flanks from either side of the hit later on, so I want to be sure the start and end for the hit I'm using are correct. However, sometimes I get results like this:

54653640 54652673

Where the "s. start" value is higher than the "s. end" value. In these cases I select the second value as the start, but I feel like I'm still having problems later on because of this.

Why are these values the wrong way around for so many hits?

blast biopython • 2.4k views
ADD COMMENT
4
Entering edit mode
7.3 years ago
h.mon 35k

This is happening because your query is being aligned to the reverse-complemented subject sequence. Why this is happening and the consequences for your downstream analyses will depend on what your sequences and what your downstream analyses are.

ADD COMMENT
0
Entering edit mode

So when retrieving flanks from either side of a reverse-complement hit, is it wrong to simply extract the sequence either side of the hit in the FASTA sequence for that chromosome? Do I need to reverse-complement it too?

ADD REPLY
1
Entering edit mode

If I have understood your question, yes - if you are trying to get the matched sequence from the reference, it would need reverse complementing.

ADD REPLY
0
Entering edit mode

So when retrieving the reverse-complement hit, would it be correct to use the option "-strand minus" in blastdbcmd and change the order of "s. start" and "s. end" values in the flag "-range" (i.e., for the above example -range 54652673-54653640)?

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6