Duplicated console-version BLAST results
1
1
Entering edit mode
8.0 years ago
crimsontabaq ▴ 70

Hi there!

I'm blasting (blastx2.2.28) my assembly to the annotated refseq. Below - the command line I used:

blastx -query assembly.fa -out Blast -db BlastCustomDB -num_threads 8 -evalue 1e-6 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -max_hsps_per_subject 1 -max_target_seqs 1 -outfmt 6

I specified quantity of HighcoringSegmentPairs (1), but still BLAST output contains two or more HSP for some queries.

Here's the output example (it goes like query ID, subject ID, identity %, length of HSP, mismatches, gap opens ... E-value, bit score):

 TR60-c0_g1_i1  EKV52009.1  37.36   174 57  7   ...         2e-14   74.3
 TR60-c0_g1_i1  EKV52009.1  64.37   87  30  1   ...         6e-10   59.7
 TR61-c0_g1_i1  EKV51611.1  49.41   253 121 2   ...         4e-47    161
 TR64-c0_g1_i1  EKV51289.1  60.00   175 62  1   ...         9e-66    208

After examination with 'uniq' command it became clear that all duplicated results for a query each time were aligned to a single gene in different position (what is exactly HPSs).

There's the way just to remove multiple HSP, but maybe it's possible to change some parameters in blast run to get the right single-hit-single-HSP output directly?

Thanks in advance!

blast alignment • 2.7k views
ADD COMMENT
0
Entering edit mode

Please see this related question and answer: Why Is Blast Creating Duplicates In My Output Files?! including the comment not to use tabular blast output. Does it solve your problem?

ADD REPLY
0
Entering edit mode

Unfortunately, no. Blast failed to run with the same parameters to create ASN.1 output, so I removed all parameters but -max_target_seqs, -num_threads, -evalue. I've got the same output (with duplicates) after converting ASN.1 to tabular.

ADD REPLY
0
Entering edit mode

Why did blast fail with that output format? Anyway, could you post the sequences TR60-c0_g1_i1 and EKV52009.1 so we can check it out?

ADD REPLY
0
Entering edit mode

This is the text of the error (when I ran it with parameters which I've used for tabular output):

BLAST options error: err:tried to set option (130) and value (1), line (537).

TR60-c0_g1_i1:

AATCGGGTCTTGGGTATGGTACGTTACAAGAATTTCAAGAGATATACTTTATACAGAAGA GCTATTCACAATACAACTTGAGATATATACAATAAACAGTATGCACAATGCAATAATCTA TACAACAAAGCTGATGTCAGAACCCAAAAAATTGCTGAACGAGTGCAAGCAAGCCAAAAA AAAAAGAAAGGAGAGTGATTAGTATGATGGTATGTATACAGTCCTCGTCCTCGTTTGCTC TAGTAAAGAAACACAGATGGCATGGCGAAACATCATGAAGGCGCGTCAAAATCATCGCTG TCGTCGTTCATCCGGCGTCGCTTAGCTGGTCCCGGCTGGAGGAGAGAGTCTGCAGACACG ATGCCTCTAGATTCGAGGACGTCGCGAGACACTTTGGCTGGTCCATCTGCAGGCCTTTTA GACCTAACTCGAGTTCTGCGCAAAGCGCCGAAGGCCTGAGCTTTGATCACCGCCAGAGTA CCCGACAGCGGAGCCACGCCAGGCGAAGGTGAGCGAGAGGAAGGCGCTGTTATCCGACGT GGACGAGCCTGGTGCTGACTGGAGTAGGAAGAATCGCCACCATCGCTGCCGTGCTGAGAC CCTTGCTGAGAATCATTCTCACTTCCTGCGGTACCGCTTGTTATCGGGGGAAGGAGTGGA AGTAGGCTGGTACGGTCCAAAATAGCATCCATGCGTCGATTGTTTCCACCAAGTCTGTAG CGAACTCCCCCGTCGTTTCCTGAGCTTGAGGCGATGGCGGAAATATCTGCAGTGCCACCA TGAAGAGAGATAACCTCAGCGCCGTGACTCTGTGTCGTTTGACGACGATACTCAAATTCT GGTCGATCCTGTGCGGCACCCTGCGCGTCGTTGAAGGCAAAAGGAAAAGGGGAAGTCAAA GCGGTCGTAGATGAAGAGGGTGGCGTAGATGAAGAAGGAGAAATAGATCGACTGTTTGGG AAGACGTATGGAACGACCGTGGGTTCTAAATTGTTCAAAGGTACTTTTTGGTCATCGGGA TATCGATTGCCAATGAGATTCATATCGGATGCGGAGGTGAGTGTCGGTGATTCAACAAAA TGCGGCGAATTGGAGGAGTGGCTACTGGTCTGTGTCCAAGAAGTATATGGTTGATTCTGG TGATTAAATGAGACTGAAGGTGAGGCAGGATACTGATTAAAACCCAGACTGTTATCAGGG TATTGTGTGCTGTTGAACGTCGTAGACGGCGGGGACGGGTAAGAGGGAAACTCTGATTCG GTCTCCGAATTTTGTCCTCGCTTTGGCCAAGACGCTCTCCAAAAGTTTTCTTGATTTCGT AGTGCTTGTCGCAGCTGATTGTTCTCTTGCTGTAATTCCTGAGCCTCGGTAGTCGCTTCC CGGCATGAGTCCTGAAGTACTCGGACAACTGATTCTAGGTTCACGACCGTCTCCTCAAGA GTTGTAATGTAATTTGTGCGACGAGCTCGAAAAGCCGCTTGAGCATCAGCATTCTTTTTT TTCCTTAGTGCAATATCGGTCATTCGAGAAGGATTCTGTTCATCCGACTTGGTCGATTGG TTTGTCGTCGTCGTCGAAGAGGACATGGCCACAGTGGCGATGGGGAAAGTGGGATCCTAC GAATTGCCAATAGCGGAGATCTAGAAGAGATTTAAAACACTTTGCATGAAATGCCACGAA AAGAATTCGAAGTAGGGCACGAGGTCA

EKV52009.1:

MPKSPRASPSFATVKSTPDNPVSRLSDIALRKKKNADAQAAFRARRANYIATLEETVTSLESV VLQLQDSWRESRTELQDARQELLRLQHHCRERDRFWRDLWENRRGGPTPDADDIASSPSFS PVHVHPNGLGSQIQNSQLGPYTLDGMTYRTSEDTPSQPPYAGQQDFSTTVPNPLPFNDGDVS GDASHCLGQRVEKMTYVPRFPNDDSKLALGNFEATSFTFSSDRFTDSRSLSPTSTPSSSSSTS IPSASYPYTFPDPAFTQDTFRRPSHGAEMTLHGGTADVSVIPGADAIRYRLGNRQPMSMTDRP MLPQTRPGSANESQHERGSSDAAPALSCTVAVIKAQAFGALRRTRTRTKKSTETASRVAHDVL EARGIGIGSRQRASEDDEF

ADD REPLY
0
Entering edit mode

Does this happen only in a tabular output?

ADD REPLY
1
Entering edit mode

Dobry wieczór! I tried xml and ASN.1 (which is I converted to tabular again - and it gave me the same duplicated results), but it's difficult to me to read these formats, so I can't say if it's the same thing happens with other formats.


Just checked .xml output of blastx2.2.28, same duplication appeared.

ADD REPLY
0
Entering edit mode

dobryj večer! see my answer below :)

ADD REPLY
3
Entering edit mode
8.0 years ago

In NCBI BLAST+ versions below 2.2.29, the -max_hsps_per_subject parameter does not work correctly. This parameter was further replaced with -max_hsps, which works fine.

Download the latest NCBI BLAST+ (from NCBI FTP) and you will be fine:

blastx -query assembly.fasta -db blastdb.fa -max_hsps 1 -outfmt 6

Output:

TR60-c0_g1_i1   EKV52009.1  37.356  174 57  7   809 288 273 394 1.32e-18    74.3
ADD COMMENT
1
Entering edit mode

It works! with latest blast:

blastx -db BlastDB -query assembly.fa  -max_target_seqs 1 -max_hsps 1 -num_threads 8 -evalue 1e-6 -outfmt 6

This job gave me little bit more results than blast-2.2.28 output sorted with 'uniq' command, yet without any double-hsps. Dziękuję!

ADD REPLY

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6