Question

blastdbcmd: sequence range error

0

Entering edit mode

5.3 years ago

agata ▴ 10

I use blast+ package. After running blastn I get a sequence range that I forward then to blastdbcmd. But when command spot coordinates on minus strand it complains:

blastdbcmd -db ./database/database.fna \
-entry CM008969.1 \
-strand minus \
-range 659-392

BLAST engine error: Failed to parse sequence range (start cannot be larger than stop)

Is it necessary to "flipped" the coordinates in this case? Or it is only mistake in syntax - (hopefully, as there is -strand parameter, that I guess should deal with the problem)?

If the "flip" is needed how to do it? Would I need to count the length of the genome and then subtract differences in distances to both ends of the strand to the respect of the opposite strand?

-------------------------------------------------------- EDIT -----------------------------------------------------------------------------------------------

query:

>U6
TTGTGCTTCGGCACAACTGTTAAAATTGGAAAAATACAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCATAAATCGAGAAATGGTTCCAAATTTTT

database made from genome: https://www.ncbi.nlm.nih.gov/assembly/GCA_000002595.3

when using blastn result looks like:

# Fields: query id, subject id, % identity, % query coverage per subject, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
 # 11 hits found
 U6 CM008969.1  100.00  100 104 0   0   1   104 3788683 3788580 6e-49  193
 U6 CM008969.1  99.02   100 102 1   0   3   104 3473456 3473557 4e-46  183
 U6 CM008969.1  96.04   100 101 4   0   1   101 2474443 2474543 1e-40  165
 U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3476629 3476562 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761506 3761573 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761687 3761754 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3761869 3761936 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3771731 3771664 3e-27  121
 U6 CM008969.1  98.53   100 68  1   0   37  104 3772741 3772674 3e-27  121
 U6 CM008969.1  97.06   100 68  2   0   37  104 3772559 3772492 1e-25  115

and now, using 4th record from above (as it is the first hit on "minus" strand) in blastdbcmd as follow:

blastdbcmd -db ./database/database.fna \
-entry CM008969.1 \
-strand minus \
-range 3476403-3476470 \ #flipped coordinates
-outfmt %s

gives:

CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCACAAATCGAGAAATGGTTCCAAATTTTT

and finally, those sequence as test serves as a new query against the same database gives results:

# Fields: query id, subject id, % identity, % query coverage per subject, alignment length, mismatches, gap opens, 
q. start, q. end, s. start, s. end, evalue, bit score
# 12 hits found
test    CM008969.1  100.00  100 68  0   0   1   68  3473490 3473557 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3476470 3476403 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3476629 3476562 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761506 3761573 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761687 3761754 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3761869 3761936 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3771731 3771664 4e-29  126
test    CM008969.1  100.00  100 68  0   0   1   68  3772741 3772674 4e-29  126
test    CM008969.1  98.53   100 68  1   0   1   68  3772559 3772492 2e-27  121
test    CM008969.1  98.53   100 68  1   0   1   68  3788647 3788580 2e-27  121
test    CM008969.1  96.92   100 65  2   0   1   65  2474479 2474543 4e-24  110
test    CM008969.1  97.37   100 38  1   0   31  68  3773539 3773502 8e-1165.8

At the end, I would expect the same stats results, but they are not the same:

U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121
vs
test    CM008969.1  100.00  100 68  0   0   1   68  3473490 3473557 4e-29  126

blast+ blastdbcmd • 2.3k views

ADD COMMENT • link 5.3 years ago by agata ▴ 10

0

Entering edit mode

Why not try to flip the numbers and give it a try? Strand specification should be adequate to get the correct range. You may need to reverse complement the result if the query returns top strand sequence.

ADD REPLY • link 5.3 years ago by GenoMax 152k

0

Entering edit mode

When I first blasted my original query with the reference genome I got 100% identity and query coverage on "minus" strand.

But when I took coordinates from this hit and put them to blastdbcmd with flipped coordinates and -strand minus I got 2 mismatches, identity - 97% and query coverage 98%.

If it would be only about flipping the coordinates then results should be the same, right?

ADD REPLY • link 5.3 years ago by agata ▴ 10

0

Entering edit mode

With blastdbcmd you are just recovering sequence from an entry that is in your database. I am not sure what you are referring to in terms of mismatches/identity.

ADD REPLY • link 5.3 years ago by GenoMax 152k

0

Entering edit mode

Exactly. My "recovered" sequence is different from original - this what I wanted to show by the identity and coverage query.

ADD REPLY • link 5.3 years ago by agata ▴ 10

0

Entering edit mode

I will have to look at that but it does not seem logical why that should happen. Can you post examples of the alignments? What does original mean here?

ADD REPLY • link 5.3 years ago by GenoMax 152k

0

Entering edit mode

Please see my edit - I posted the example there.

ADD REPLY • link 5.3 years ago by agata ▴ 10

1

Entering edit mode

If you search with a sequence that is already present in the database you are bound to get 100% hit to entire sequence which is what you get in your test. Since there are 8 hits with 100% identity, they are listed based on starting nucleotide (sorted) in hit.

Second hit in your test results:

test    CM008969.1  100.00  100 68  0   0   1   68  3476470 3476403 4e-29  126

perfectly matches your original hit co-ordinates

U6 CM008969.1  98.53   100 68  1   0   37  104 3476470 3476403 3e-27  121

As for differences between U6 and test (there is one base pair difference)

Query  37   CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCATAAATCGAGAAATGGTTCC  96
            ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||
Sbjct  1    CAGAGAAGATTAGCATGGCCCCTGCGCAAGGATGACACGCACAAATCGAGAAATGGTTCC  60

Query  97   AAATTTTT  104
            ||||||||
Sbjct  61   AAATTTTT  68

ADD REPLY • link 5.3 years ago by GenoMax 152k

0

Entering edit mode

I haven't used this package, but if you have specified strand is "minus", why do you specify the range as "end to start"?

. <- range ->

A T T C C T C C C A G +

T A A G G A G G G T C - (strand)

Like this, the range just determines which part of double-stranded info will be used, and strand determines which strand will be used

ADD REPLY • link 5.3 years ago by Jianyu ▴ 580

0

Entering edit mode

This is how the previously used tool blastn from the same package is giving the coordinates of an alignment. I suppose they should work together, I mean the format should be unified within one package.

ADD REPLY • link 5.3 years ago by agata ▴ 10