I'm carrying out commandline local BLAST+ using Biopython. The outputfmt = 7 so I get tabular format with comments. I use the BLAST results to extract flanks from either side of the hit later on, so I want to be sure the start and end for the hit I'm using are correct. However, sometimes I get results like this:
54653640 54652673
Where the "s. start" value is higher than the "s. end" value. In these cases I select the second value as the start, but I feel like I'm still having problems later on because of this.
Why are these values the wrong way around for so many hits?
So when retrieving flanks from either side of a reverse-complement hit, is it wrong to simply extract the sequence either side of the hit in the FASTA sequence for that chromosome? Do I need to reverse-complement it too?
If I have understood your question, yes - if you are trying to get the matched sequence from the reference, it would need reverse complementing.
So when retrieving the reverse-complement hit, would it be correct to use the option "-strand minus" in blastdbcmd and change the order of "s. start" and "s. end" values in the flag "-range" (i.e., for the above example -range 54652673-54653640)?