blastdbcmd: sequence range error
0
0
Entering edit mode
4.6 years ago
agata ▴ 10

I use blast+ package. After running blastn I get a sequence range that I forward then to blastdbcmd. But when command spot coordinates on minus strand it complains:

blastdbcmd -db ./database/database.fna \
-entry CM008969.1 \
-strand minus \
-range 659-392

BLAST engine error: Failed to parse sequence range (start cannot be larger than stop)

Is it necessary to "flipped" the coordinates in this case? Or it is only mistake in syntax - (hopefully, as there is -strand parameter, that I guess should deal with the problem)?

If the "flip" is needed how to do it? Would I need to count the length of the genome and then subtract differences in distances to both ends of the strand to the respect of the opposite strand?

-------------------------------------------------------- EDIT -----------------------------------------------------------------------------------------------

query:

>U6
TTGTGCTTCGGCACAACTGTTAAAATTGGAAAAATACAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCATAAATCGAGAAATGGTTCCAAATTTTT

database made from genome: https://www.ncbi.nlm.nih.gov/assembly/GCA_000002595.3

when using blastn result looks like:

# Fields: query id, subject id, % identity, % query coverage per subject, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
 # 11 hits found
 U6 CM008969.1  100.00  100 104 0   0   1   104 3788683 3788580 6e-49  193
 U6 CM008969.1  99.02   100 102 1   0   3   104 3473456 3473557 4e-46  183
 U6 CM008969.1  96.04   100 101 4   0   1   101 2474443 2474543 1e-40  165
 U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3476629 3476562 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761506 3761573 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761687 3761754 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761869 3761936 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3771731 3771664 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3772741 3772674 3e-27  121
 U6 CM008969.1  97.06   100 68  2   0   37  104 3772559 3772492 1e-25  115

and now, using 4th record from above (as it is the first hit on "minus" strand) in blastdbcmd as follow:

blastdbcmd -db ./database/database.fna \
-entry CM008969.1 \
-strand minus \
-range 3476403-3476470 \ #flipped coordinates
-outfmt %s

gives:

CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCACAAATCGAGAAATGGTTCCAAATTTTT

and finally, those sequence as test serves as a new query against the same database gives results:

# Fields: query id, subject id, % identity, % query coverage per subject, alignment length, mismatches, gap opens, 
q. start, q. end, s. start, s. end, evalue, bit score
# 12 hits found
test    CM008969.1  100.00  100 68  0   0   1   68  3473490 3473557 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3476470 3476403 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3476629 3476562 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761506 3761573 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761687 3761754 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761869 3761936 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3771731 3771664 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3772741 3772674 4e-29  126
test    CM008969.1  98.53   100 68  1   0   1   68  3772559 3772492 2e-27  121
test    CM008969.1  98.53   100 68  1   0   1   68  3788647 3788580 2e-27  121
test    CM008969.1  96.92   100 65  2   0   1   65  2474479 2474543 4e-24  110
test    CM008969.1  97.37   100 38  1   0   31  68  3773539 3773502 8e-1165.8

At the end, I would expect the same stats results, but they are not the same:

U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121
vs
test    CM008969.1  100.00  100 68  0   0   1   68  3473490 3473557 4e-29  126
blast+ blastdbcmd • 1.9k views
ADD COMMENT
0
Entering edit mode

Why not try to flip the numbers and give it a try? Strand specification should be adequate to get the correct range. You may need to reverse complement the result if the query returns top strand sequence.

ADD REPLY
0
Entering edit mode

When I first blasted my original query with the reference genome I got 100% identity and query coverage on "minus" strand.

But when I took coordinates from this hit and put them to blastdbcmd with flipped coordinates and -strand minus I got 2 mismatches, identity - 97% and query coverage 98%.

If it would be only about flipping the coordinates then results should be the same, right?

ADD REPLY
0
Entering edit mode

With blastdbcmd you are just recovering sequence from an entry that is in your database. I am not sure what you are referring to in terms of mismatches/identity.

ADD REPLY
0
Entering edit mode

Exactly. My "recovered" sequence is different from original - this what I wanted to show by the identity and coverage query.

ADD REPLY
0
Entering edit mode

I will have to look at that but it does not seem logical why that should happen. Can you post examples of the alignments? What does original mean here?

ADD REPLY
0
Entering edit mode

Please see my edit - I posted the example there.

ADD REPLY
1
Entering edit mode

If you search with a sequence that is already present in the database you are bound to get 100% hit to entire sequence which is what you get in your test. Since there are 8 hits with 100% identity, they are listed based on starting nucleotide (sorted) in hit.

Second hit in your test results:

test    CM008969.1  100.00  100 68  0   0   1   68  3476470 3476403 4e-29  126

perfectly matches your original hit co-ordinates

U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121

As for differences between U6 and test (there is one base pair difference)

Query  37   CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCATAAATCGAGAAATGGTTCC  96
            ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||
Sbjct  1    CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCACAAATCGAGAAATGGTTCC  60

Query  97   AAATTTTT  104
            ||||||||
Sbjct  61   AAATTTTT  68
ADD REPLY
0
Entering edit mode

I haven't used this package, but if you have specified strand is "minus", why do you specify the range as "end to start"?

. <- range ->

A T T C C T C C C A G +

T A A G G A G G G T C - (strand)

Like this, the range just determines which part of double-stranded info will be used, and strand determines which strand will be used

ADD REPLY
0
Entering edit mode

This is how the previously used tool blastn from the same package is giving the coordinates of an alignment. I suppose they should work together, I mean the format should be unified within one package.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6