bugs of blast (not intuitively evalue cutoff)
1
1
Entering edit mode
5.0 years ago
l0404th ▴ 20

There is some weirds situation when I using simple blast search. Version of blast is 2.9.0+

When I use the evalue parameter of blastp, I think it should be a simple cut off value as the same as others. But I found out that, even with the same query and same subject. The evalue could not extract correct sequences at all.

With 1e-38 evalue, I got these sequence without the lower one (1.60e-83) But if I use 1e-35 evalue, I got this.

This makes me question the cut off/filtration of the blast.

The situation is also described in the following picture.

If there any need to download the file/db I used. please feel free to download it at download data

alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

NCBI's recommendation is you need to look at a minimum of 5 matches and filter afterwards as needed. NCBI's official reply to the paper @lieven linked below is here.

ADD REPLY
0
Entering edit mode

Please do not use screenshots of plain text - it adds an unnecessary layer to a simple problem. You can paste plain text directly in the post or use a GitHub Gist. See this post for a detailed how-to: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists

ADD REPLY
3
Entering edit mode
5.0 years ago

OK, this is a tricky issue you mention here, though one that has been extensively discussed on biostars (and other fora / blog posts )

Can't find any links to point to right now, but will look for them.

long story short: the filtering is not done at the end (== reporting stage of blast) as you might intuitively expect but also has it consequences earlier on in the blast process causing differences in output.

EDIT: here is some background info on it :

biostars posts:

a blog post (which kinda started it all):

https://blastedbio.blogspot.com/2015/12/blast-max-target-sequences-bug.html

and a paper on this:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662297/

ADD COMMENT
0
Entering edit mode

don't be put off when they (often) refer to the max_target parameter, all them parameters are affected (and influence the end result) in the same way

ADD REPLY
1
Entering edit mode

Thank you very much for your reply!! That really helpful for me.

Following your suggestion and reply, the best value to ensure the correctness of my result is filtered by myself afterwards.

According to your answer, it still has some weird... For subject sequence 003644665v1_00128,003644685v1_00882,003644735v1_00189 which shown at result with 1e-38. It make sense with your answer because it isn't the best/lowest one. But for the remained 003644635v1_01061,003644685v1_00882, it reached 3.92e-84 which is lower enough for the threshold 1e-38 or 1e-35. But why it only appear at result with 1e-35?

ADD REPLY
1
Entering edit mode

likely only the Blast developers can answer this fully, but how I understood is, is that it could be that the initial HSP (before doing extension and/or gapped alignment) is filtered out for not meeting the threshold , though if you let it pass and start extending it it will in the end get high(er) scores when taking the whole (gapped) alignment into account

ADD REPLY
0
Entering edit mode

yes, seems weird to me. González-Pech, et al., 2018 also talks about it, but 3.92e-84 and 1e-38 differ a lot. Any ideas?

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6