Hi there!
I'm blasting (blastx2.2.28) my assembly to the annotated refseq. Below - the command line I used:
blastx -query assembly.fa -out Blast -db BlastCustomDB -num_threads 8 -evalue 1e-6 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -max_hsps_per_subject 1 -max_target_seqs 1 -outfmt 6
I specified quantity of HighcoringSegmentPairs (1), but still BLAST output contains two or more HSP for some queries.
Here's the output example (it goes like query ID, subject ID, identity %, length of HSP, mismatches, gap opens ... E-value, bit score):
TR60-c0_g1_i1 EKV52009.1 37.36 174 57 7 ... 2e-14 74.3
TR60-c0_g1_i1 EKV52009.1 64.37 87 30 1 ... 6e-10 59.7
TR61-c0_g1_i1 EKV51611.1 49.41 253 121 2 ... 4e-47 161
TR64-c0_g1_i1 EKV51289.1 60.00 175 62 1 ... 9e-66 208
After examination with 'uniq' command it became clear that all duplicated results for a query each time were aligned to a single gene in different position (what is exactly HPSs).
There's the way just to remove multiple HSP, but maybe it's possible to change some parameters in blast run to get the right single-hit-single-HSP output directly?
Thanks in advance!
Please see this related question and answer: Why Is Blast Creating Duplicates In My Output Files?! including the comment not to use tabular blast output. Does it solve your problem?
Unfortunately, no. Blast failed to run with the same parameters to create ASN.1 output, so I removed all parameters but -max_target_seqs, -num_threads, -evalue. I've got the same output (with duplicates) after converting ASN.1 to tabular.
Why did blast fail with that output format? Anyway, could you post the sequences TR60-c0_g1_i1 and EKV52009.1 so we can check it out?
This is the text of the error (when I ran it with parameters which I've used for tabular output):
TR60-c0_g1_i1:
EKV52009.1:
Does this happen only in a tabular output?
Dobry wieczór! I tried xml and ASN.1 (which is I converted to tabular again - and it gave me the same duplicated results), but it's difficult to me to read these formats, so I can't say if it's the same thing happens with other formats.
Just checked .xml output of blastx2.2.28, same duplication appeared.
dobryj večer! see my answer below :)