There is some weirds situation when I using simple blast search. Version of blast is 2.9.0+
When I use the evalue
parameter of blastp, I think it should be a simple cut off value as the same as others.
But I found out that, even with the same query and same subject. The evalue could not extract correct sequences at all.
With 1e-38 evalue, I got these sequence without the lower one (1.60e-83) But if I use 1e-35 evalue, I got this.
This makes me question the cut off/filtration of the blast.
The situation is also described in the following picture.
$ blastp -query ~/project/nitrogen_cycle/curated_genes/hzsA.faa -db only_Cla.faa -outfmt 6 -max_hsps 100000 -max_target_seqs 1000 -evalue 1e-38 -comp_based_stats 0 | |
AEW50000.1 003644665v1_00128 34.561 353 187 13 66 411 194 509 5.07e-44 163 | |
AEW50000.1 003644685v1_00882 34.670 349 184 13 66 407 194 505 9.52e-44 162 | |
AEW50000.1 003644735v1_00189 34.670 349 184 13 66 407 194 505 9.52e-44 162 | |
$ blastp -query ~/project/nitrogen_cycle/curated_genes/hzsA.faa -db only_Cla.faa -outfmt 6 -max_hsps 100000 -max_target_seqs 1000 -evalue 1e-35 -comp_based_stats 0 | |
CAJ73613.1 003644635v1_01061 31.021 764 390 24 59 805 43 686 3.92e-84 283 | |
CAJ73613.1 003644665v1_00128 29.868 760 399 24 59 803 34 674 1.13e-78 268 | |
CAJ73613.1 003644685v1_00882 29.868 760 399 24 59 803 34 674 1.11e-77 265 | |
CAJ73613.1 003644735v1_00189 29.868 760 399 24 59 803 34 674 1.11e-77 265 | |
AEW50000.1 003644665v1_00128 34.561 353 187 13 66 411 194 509 5.07e-44 163 | |
AEW50000.1 003644685v1_00882 34.670 349 184 13 66 407 194 505 9.52e-44 162 | |
AEW50000.1 003644735v1_00189 34.670 349 184 13 66 407 194 505 9.52e-44 162 | |
AEW49995.1 003644635v1_01061 30.259 347 200 9 66 406 204 514 5.80e-36 140 | |
AEW50022.1 007136405v1_02974 31.143 350 197 12 68 413 175 484 2.66e-36 140 | |
AEW50021.1 007136405v1_02974 31.143 350 197 12 62 407 175 484 1.73e-36 140 |
If there any need to download the file/db I used. please feel free to download it at download data
NCBI's recommendation is you need to look at a minimum of 5 matches and filter afterwards as needed. NCBI's official reply to the paper @lieven linked below is here.
Please do not use screenshots of plain text - it adds an unnecessary layer to a simple problem. You can paste plain text directly in the post or use a GitHub Gist. See this post for a detailed how-to: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists