I am attempting to setup a local BLAST environment , and was doing some test queries on both the local blast environment and the online web server, when I noticed some dramatic discrepancies. On both the web server and the local blast I was using word size 3, gap open 11, gap extend 1, an e-value threshold of 10.0 and BLOSUM62 scoring matrix. I got 2 total results on the local search, and 25 results on the web server. The version of the BLAST DB is less than a month old, and some of the sequences that were found in the web server results, but not the local results, are included in it.
Specifically, I was test BLASTing this DNA Polymerase into humans, using the new taxids argument locally and the organism tag (taxid 9606) on the web server.
4XVI_A Chain A, Dna Polymerase Nu [Homo sapiens]KKHFCDIRHLDDWAKSQLIEMLKQAAALVITVMYTDGSTQLGADQTPVSSVRGIVVLVKRQAEGGHGCPDAPACGPVLEGFVSDDPCIYIQIEHSAIWDQEQEAHQQFARNVLFQTMKCKCPVICFNAKDFVRIVLQFFGNDGSWKHVADFIGLDPRIAAWLIDPSDATPSFEDLVEKYCEKSITVKVNSTYGNSSRNIVNQNVRENLKTLYRLTMDLCSKLKDYGLWQLFRTLELPLIPILAVMESHAIQVNKEEMEKTSALLGARLKELEQEAHFVAGERFLITSNNQLREILFGKLKLHLLSQRNSLPRTGLQKYPSTSEAVLNALRDLHPLPKIILEYRQVHKIKSTFVDGLLACMKKGSISSTWNQTGTVTGRLSAKHPNIQGISKHPIQITTPKNFKGKEDKILTISPRAMFVSSKGHTFLAADFSQIELRILTHLSGDPELLKLFQESERDDVFSTLTSQWKDVPVEQVTHADREQTKKVVYAVVYGAGKERLAACLGVPIQEAAQFLESFLQKYKKIKDFARAAIAQCHQTGCVVSIMGRRRPLPRIHAHDQQLRAQAERQAVNFVVQGSAADLCKLAMIHVFTAVAASHTLTARLVAQIHDELLFEVEDPQIPECAALVRRTMESLEQVQALELQLQVPLKVSLSAGRSWGHLVPLQ
On the web server I got these 25 results:
DNA polymerase nu [Homo sapiens] 1386 1386 100% 0.0 100.00% NP_861524.2
gb|AAN52116.1| DNA polymerase N [Homo sapiens] 1386 1386 100% 0.0 100.00% AAN52116.1
pdb|4XVI|A Chain A, Dna Polymerase Nu [Homo sapiens] 1381 1381 100% 0.0 100.00% 4XVI_A
gb|EAW82538.1| polymerase (DNA directed) nu, isoform CRA_a [Homo sapiens] 1363 1363 100% 0.0 98.65% EAW82538.1
gb|AAD02338.1| putative DNA polymerase [Homo sapiens] 969 969 73% 0.0 97.36% AAD02338.1
dbj|BAG64670.1| unnamed protein product [Homo sapiens] 815 815 57% 0.0 100.00% BAG64670.1
dbj|BAD18421.1| unnamed protein product [Homo sapiens] 561 561 54% 0.0 80.83% BAD18421.1
pdb|4X0Q|A Chain A, Dna Polymerase Theta [Homo sapiens] 238 238 76% 1e-67 29.27% 4X0Q_A
pdb|4X0P|A Chain A, Dna Polymerase Theta [Homo sapiens] 238 238 76% 2e-67 29.27% 4X0P_A
gb|AAR08421.2| DNA polymerase theta [Homo sapiens] 238 238 76% 2e-65 29.27% AAR08421.2
gb|EAW79513.1| polymerase (DNA directed), theta, isoform CRA_d [Homo sapiens] 238 238 76% 2e-65 29.27% EAW79513.1
ref|NP_955452.3| DNA polymerase theta [Homo sapiens] 238 238 76% 2e-65 29.27% NP_955452.3
dbj|BAD93104.1| DNA polymerase theta variant [Homo sapiens] 238 238 76% 2e-65 29.27% BAD93104.1
gb|EAW79510.1| polymerase (DNA directed), theta, isoform CRA_a [Homo sapiens] 238 238 76% 2e-65 29.27% EAW79510.1
gb|EAW79511.1| polymerase (DNA directed), theta, isoform CRA_b [Homo sapiens] 237 237 76% 3e-65 29.27% EAW79511.1
ref|XP_011510650.1| DNA polymerase theta isoform X3 [Homo sapiens] 237 237 76% 3e-65 29.06% XP_011510650.1
gb|AAD05272.1| DNA polymerase eta [Homo sapiens] 236 236 75% 4e-65 29.20% AAD05272.1
ref|XP_011510649.1| DNA polymerase theta isoform X1 [Homo sapiens] 237 237 76% 4e-65 29.06% XP_011510649.1
ref|XP_016861054.1| DNA polymerase theta isoform X4 [Homo sapiens] 236 236 76% 5e-65 29.06% XP_016861054.1
ref|XP_011510656.1| DNA polymerase theta isoform X6 [Homo sapiens] 236 236 76% 6e-65 29.06% XP_011510656.1
gb|AAK39635.1| DNA polymerase theta [Homo sapiens] 230 230 76% 7e-63 29.27% AAK39635.1
gb|AAC33565.1| DNA polymerase theta [Homo sapiens] 229 229 76% 9e-63 29.27% AAC33565.1
ref|XP_011510645.1| DNA polymerase theta isoform X2 [Homo sapiens] 221 221 76% 8e-60 27.83% XP_011510645.1
emb|CAI56770.1| hypothetical protein [Homo sapiens] 92.0 92.0 39% 6e-18 27.15% CAI56770.1
ref|XP_011510654.1| DNA polymerase theta isoform X5 [Homo sapiens] 62.8 62.8 22% 6e-09 27.67% XP_011510654.1
Whereas in local BLAST I only got these two results
BLASTP 2.9.0+
# Query: 4XVI_A Chain A, Dna Polymerase Nu [Homo sapiens]
# Database: nr_v5
# Fields: % query coverage per subject, subject id, subject sci names
# 2 hits found
100 gb|AAN52116.1| Homo sapiens
76 gb|AAK39635.1| Homo sapiens
# BLAST processed 1 queries
Does anyone have any Ideas as to what could be causing this?
EDIT:
The command I used was:
blastp -query HomoTest -db nr_v5 -out v53REsult -outfmt "7 qcovs sseqid sscinames" -max_target_seqs 100 -taxids 9606 -evalue 10.0
Apart from making sure of the options I have specified, the web server was on its default settings.
EDIT 2:
After a clean nr_v5 install, the issue has been resolved, thank you to everyone who answered this post.
What is the exact command you used to run the local BLASTP?
And was the webpage BLASTP all default otherwise?
Without knowing what flags you used it is difficult to tell.
I have added that to the post.
have you looked around on biostar for similar posts? I seem to remember this kind of issues have been raised and 'processed' before.
One thing already is that the web version likely does not use the
-max_target_seqs 100
parameter (which is a cause of lots of "confusion")I have looked around biostars, and most of what I found was differences in the word size or a difference in the open and extend gap costs. As for the max target seqs, the webpage defaults to 100 results, does it do this in a different manner than the local version?
that is not clearly documented but we tend to assume so indeed.
try it with running it without the -max_targets parameter and then filter in post processing
Thanks for the quick reply. I have completely removed the option from my local blast command, but I still got identical results to when it was there, which is incredibly confusing.
are you sure everything worked as expected. From the sample output you provided I can spot that it is not corresponding to the output format you requested in your blastp commandline.
DB was also different version I understood, no?