I've got the explanation from NCBI. The problem was the qcov_hsp_perc parameter, as follow:
It appears that you do not fully understand on of the parameter setting's meaning, "-qcov_hsp_perc 60" to be more specific.
You are trying to force a search parameter setting between web and standalone. For many detailed aspects, such as detailed search settings, there is NO direct comparison. For web blast, submitted through -remote, my understanding is that for -remote submitted searches, NCBI blast server will ignore parameters it does not use. In this case if returned result since " " option is NOT used by the web.
Doing your remote search, but using a customized tabular output, you can see that NONE of the hits (HSPs) covers the query more thant 40% - as indicated by last column:
$ blastn -task blastn -evalue 0.1 -qcov_hsp_perc 60 -word_size 7 -dust no -reward 2 -penalty -3 -gapopen 5 -gapextend 2 -query Q -db nt -outfmt '7 std qcovhsp' -out output_7.txt -remote&
# BLASTN 2.11.0+
# Query: seq1
# RID: ACJKM90V013
# Database: nt
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % query coverage per hsp
# 1724 hits found
seq1 CP074120.1 100.000 28 0 0 61 88 1337192 1337219 0.006 51.8 32
seq1 CP065024.1 100.000 28 0 0 61 88 3189147 3189174 0.006 51.8 32
seq1 CP034658.1 100.000 28 0 0 61 88 892034 892007 0.006 51.8 32
seq1 CP047010.1 100.000 28 0 0 61 88 1472454 1472427 0.006 51.8 32
seq1 CP047002.1 100.000 28 0 0 61 88 4804955 4804928 0.006 51.8 32
In standalone, this is being used and all hits are filtered out as they should. If you do want to find hits, drop this parameter.
Regards,
Tao Tao, PhD
NCBI User Services
Thank's a lot for all of you that answer me!
Regards
are you using exactly the same version of the nt DB ?
for the matches you get back from the remote search, do they comply with the filters you have in the cmdline? (eg; is the evalue of the remote hits not smaller than for the local one?)
Dear Lieven My local database is from January 27th 2021, the remote is May 14th 2021. I don't think that could be the explanation. I revise that some hits founded in remote are present in my local DB. The evalue filter for the 2 commands are the same: 0.1
Should i put a smaller one?
I'm not saying it is but I would also not so rapidly conclude it is not related to the database. They change quickly and with adding/removing sequences the e-values of the hits changes as well. So best to double-check anyway.
take one of the hits of the remote blast and look it up in the local blast result (might need to re-run it with a less stringent Evalue) and see what e-value it gets in the local search
What is the significance of N? Are there actual N's or are they representing variable sequence?
Dear GenoMax,
There are N as we don't care exactly about with nuc are inside que 2 sequences:
For example:
ATCGTAGCTNNNNNNNNNNNNNNNNNNNNNNNNATCGTAGCT
Regards
Is the spacing important? Otherwise you could simply use the two sequences at the end and do searches that way.
Hi Geno.
Yes, the space between each sequence matters. As the sequence are the same, i could do like this right. And after search the space beetwen each result to check if it's a hit. More job than with -remote option.
The problem here is that we got a lot of sequences like this. And the reote otion limit to hundred search / day.
Thank's
well, I'm not fully convinced of your statement here.
Since you already indicate that you are limited to 100 searches / day using -remote , you will in the end be quicker and more efficient checking distance after doing search. Search only for one of the short sequences (ends are identical you said?) Get the tabular output, parse/filter that one, I can see this going fairly straightforward tbh.
Moreover, even with the approach you try to follow here you can never be sure that the hit that blasts returns will have exactly 40nt spacer between them, blast might gap the alignment or report two HSPs without the spacer.
Bottom line if the spacer is so crucial you need to double check that anyway.
Dear Lieven
You're right. The tabular output have to be used in this case (Really, i'm heping someone else that ask me about this issue :-). I will propose to him this better method: i don't realiza yet the possibility of some gaps in the N's. But anyway, i will send some request to NCBI just to understand what's wrong between this 2 method. I'm just waiting about get the actualized nt to run on local. In any case i'll send you their explanation if there is one. Thank's a lot!
While I still stand by my idea above, I accidentally came across this paragraph from the NCBI blast guides:
just to say your approach was not that bad :) , though the case described above is not exactly your use case.
you might also consider to run the normal blastn (task -blastn) , the short version is actually for sequences below 50nt, which yours are not it seems. If you do then set the word-size to less than the default 11 .
Dear lieven.
No hits found with -task blastn command. That should be normal, as the sequence "seq" is less than 50 bp, no? regards
dunno, you have 2x 28 + 40 , sounds like more than 50 ;)
but ok, no hits thus. (also not with the -remote) ?
Ok, sorry. You're right. That's more than 50. As the N's mean at least one nucleotide.. The -remote give me back hits...
Thank's!
are you requesting default alignment output from blast? if so can you check and post here for both the remote and local blast the last couples of line in it (all at the bottom), it should say things like 'effective database size' and such .
My BD is 3 months old. Not sure that could explain the problem:
For the local search:
For the remote search :
not showing any discrepancies what I hoped for but you can still see that the database itself has changed significantly over those past months.
That itself should not be the cause of the issue here (I think) but it will have an effect on the evalues being reported so currently my best guess is on that.
Did you had a look already at the check I proposed here above somewhere? (check one hit in detail?)
Hi Lieven. I'm waiting for the ultimate chunck of the nt databases to download to run the blastn with the more recent dabatase. And yes i will do some sutdy about searching each of sequence, with tabular output, and do some "simplistic" mathematics to detect the real hits we are searching. Thank's a lot about all of your coments.
fair enough.
(though you could also do that check with the old local DB, but I can understand you'll wait for the update nt )
if you are only interested in identical matches with fixed spacer you could consider doing some simple reg-ex matching (I just thought of)
one of the top hits doing a google search gives me this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637886/ perhaps worth considering?