Hi there,
I am using blastp (using blast+ 2.2.25) to align bacterial proteins (in the file multiple_queries.fasta
) against a custom database (genome.db
) containing all proteins coded by the genomes of 6 different bacteria.
I used the following command:
blastp -query multiple_queries.fasta -db genome_db -out file_out.txt -outfmt "6 qseqid sseqid evalue bitscore qlen slen length pident ppos gaps" -num_alignments 50
Some of the output in file_out.txt looks like this for the query P21171
:
...
P21171 gi|16804543|ref|NP_466028.1| 6e-50 194 484 401 141 64.54 74.47 8
P21171 gi|16802439|ref|NP_463924.1| 6e-35 144 484 227 119 57.14 75.63 3
P21171 gi|16802439|ref|NP_463924.1| 0.006 38.5 484 227 80 36.25 47.50 3
P21171 gi|126697942|ref|YP_001086839.1| 5e-26 115 484 335 125 49.60 67.20 6
P21171 gi|126698969|ref|YP_001087866.1| 2e-25 113 484 509 105 46.67 64.76 2
...
I am puzzled by line 2 and 3 in this output.
When I ran the same query protein (P21171) on the blastp web-interface, I didn't get these unusual hits.
I don't understand why blastp gives two different results for the same sequence.
Thanks for your time!
Right you are - thanks! I got two alignments (one in each end of the protein). Do you happen to know how it is possible to print only the best hit for each sequence?
AFAIK this is not possible
Enter the BLAST parser :)
That's why we have BLAST parsers.
I couldn't find anything in the manual either. Thanks again.