Question

Blastn command line results -- bad alignments but high scores

0

Entering edit mode

9.7 years ago

jennycade • 0

I'm running blastn from the command line and keep getting weird results -- about half of the sequences have terrible alignments to the query, but have the same scores and E values as perfect matches. Using the -perc_identity option doesn't seem to change anything. I've tried changing just about everything except for the blast database.

If anyone has any advice I'd be very grateful.

ETA: This is using BLASTN 2.2.30+.

Command line used: blastn -db ../i20.fasta -query query1.txt -out test.txt -perc_identity 100

Example of results:

>HS3:591:C5VMVACXX:4:1101:10795:94073 1:N:0:CTAAGGTC
Length=100

 Score =   102 bits (55),  Expect = 2e-20
 Identities = 12/55 (22%), Gaps = 0/55 (0%)
 Strand=Plus/Plus

Query  1   GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  55
           |    |   | |            || || | |                |    |
Sbjct  33  GGTGGTAAGATTCAATAAATACAATACAAGACACACATTATCTAATGGCTCTTTT  87

>HS3:591:C5VMVACXX:4:2316:8876:76137 1:N:0:CTAAGGTC
Length=100

 Score =   102 bits (55),  Expect = 2e-20
 Identities = 55/55 (100%), Gaps = 0/55 (0%)
 Strand=Plus/Plus

Query  1   GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  55
           |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  41  GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  95

blast • 2.9k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by jennycade • 0

0

Entering edit mode

If you blast against very big contigs, then blast will return all the matches that fit your threshold. For instance your query could match many times against chromosome 1; so the score isn't for any particular match but for chromosome 1 in total.

Can you provide the blast version and the exact command line used and we can provide more help.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by mark.ziemann ★ 1.9k

0

Entering edit mode

Thanks for the reply. I added the version and command line I used to the post.

The database I'm searching is a bunch of sequencing reads, so they're not huge contigs. The thing that's really confusing me is that one sequence that only matches in 12 positions has the same score as another sequence one that matches all the way across.

ADD REPLY • link 9.7 years ago by jennycade • 0

0

Entering edit mode

Can you post the read sequences that were the queries in your example output? And the species of interest (I don't have your database). I wanted to recreate this locally.

ADD REPLY • link 9.7 years ago by pld 5.1k

0

Entering edit mode

Thanks. The query is just the 55 bp sequence shown in the example—the sequencing reads are in the database. This is in an Arabidopsis mutant.

(I hope I'm being clear enough. I'm a newbie to bioinformatics!)

ADD REPLY • link 9.7 years ago by jennycade • 0

0

Entering edit mode

Can you pull the sequencing reads from the database?

ADD REPLY • link 9.7 years ago by pld 5.1k

score 0 · Answer 1 · 2015-03-13

Mystery solved—I'm looking at paired-end sequencing reads, and it turns out that the poorly-aligning sequences that showed up were mates of the good matches. I had concatenated the two read files together before I made the database, so blast was pulling out both of the mates in a pair when it gave me the results.

When I blast against a database made from just one of the read files it works fine.