Why the local blast and online blast produce different results?
2
4
Entering edit mode
9.9 years ago
grayapply2009 ▴ 300

I downloaded the latest nt database from NCBI FTP and did a local blast against the database. All my five sequences are virus sequences. However, when I blast these sequences with the online blast tool (the one on NCBI), all these sequences are bacteria sequences (Berkhoderia).

Why the difference? Which one is more reliable?

blast different results • 9.3k views
ADD COMMENT
1
Entering edit mode

Are you sure that you downloaded the correct database?

ADD REPLY
0
Entering edit mode

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

The above is where I downloaded the nt database.

ADD REPLY
3
Entering edit mode
9.9 years ago
Juke34 8.9k

Assuming you have exactly the same database online and locally, I had exactly the same problem as you. The problem stemmed of a difference of default parameters between the local blast and the online blast. Indeed, the "word size" parameter was different. It's something easy to check.

ADD COMMENT
0
Entering edit mode

Thanks for this solution @Juke-34. Indeed, the command line default parameter for blastn is 28, whereas the online default parameter is 11...

ADD REPLY
1
Entering edit mode
9.9 years ago
Siva ★ 1.9k

Can you post the full command you used for running the local blast? The default search algorithm for nucleotide BLAST at NCBI website is "megablast" whereas the default for the standalone is "blastn".

You can check the number of sequences in the 'nt' db you downloaded using 'blastdbcmd' and compare it with the number of sequences in the online 'nt' version (by clicking the ? next to the Database drop-down menu).

ADD COMMENT
0
Entering edit mode
blastn -query test_query.fa -db nt.00/nt.00 -task blastn -dust no -outfmt "6 qseqid stitle staxids scomnames sscinames sskingdoms pident" -max_target_seqs 1

The online database (nr/nt) description is "The nucleotide collection consists of GenBank+EMBL+DDBJ+PDB+RefSeq sequences". It is a mixed database compared to the nt database I downloaded.

ADD REPLY
1
Entering edit mode

There could be at least two reasons for the differences you mentioned in your original post.

  1. The parameters you use for the local BLAST and the online BLAST are different.

    You are using 'blastn' for the local BLAST, but the default algorithm for online one is 'megablast'. Also, you disabled filtering (-dust no), but it is enabled by default in the online BLAST. Did you modify the parameters in the online BLAST to match the command you posted?

  2. The BLAST databases you are searching against are different.

    Right now, you are searching against only one of the 26 subsets of 'nt' database.. I hope you read this in the FTP Readme file

    Large databases are formatted in multiple one-gigabyte volumes, which are named using the basename.##.tar.gz convention. All volumes with the same base name are required. An alias file is provided to tie individual volumes together so that the database can be called using the base name (without the .nal or .pal extension). For example, to call the est database, simply use "-db est" option in the command line (without the quotes).

    You need to download all the nt.#.tar.gz files, where # is 00 to 25 and unzip and untar all these files in one directory. Then, you can run BLAST with the option -db nt.

ADD REPLY
0
Entering edit mode

Thank you for the reply, Siva. Actually, I downloaded all 26 files and unzipped them to my computer. The reason I blast against only the nt.00 folder is this folder contains an alias file (index file) that calls the information stored in all 26 folders.

I didn't change anything in the online blast. How do I just blast against the nt database online? It looks like the megablast is the only choice online.

By the way, what does the filter do in the blast? What is the effect of disabling it?

ADD REPLY
1
Entering edit mode

I am sorry for assuming that you did not download all the 26 files (the same alias file nt.nal is present in all the 26 directories) . But you are using only one of the 26 files. You need to use only the base name (-db nt) to use all the 26 files. If BLAST complains that the database "nt" is not found, either you need to put all the unzipped files in one directory or copy the alias file to the same directory where you have the 26 directories.

There are three choices for the algorithm under "Program Selection": megablast, discontiguous megabalst and blastn. You can select 'blastn'.

Filtering masks the low complexity regions in your query sequence. If you disable filtering, you will get hits that share only the low complexity regions which are not very useful. You can read more about this option here.

ADD REPLY
0
Entering edit mode

Hey, Siva. You are exactly right. This time I blasted against the entire nt database with the command line blastn -query test_query.fa -db nt/nt -task blastn -dust no -outfmt "6 qseqid stitle staxids scomnames sscinames sskingdoms pident" -max_target_seqs 1. And the results are same now.

I cannot believe some guy online misled me so much two months ago, who told me to blast against the first volume as "it contains the index file". I've been doing the wrong thing the entire semester.

Many thanks to you.

By the way, how do you blast against multiple databases simultaneously such as, nr, nt, swissprot...

Another thing is when you use blastn online it actually blasts against nt/nr which may lead to the different results as I only blast against nt database on my computer. How do I deal with it?

ADD REPLY
1
Entering edit mode

You are welcome. To search against multiple BLAST databases, just concatenate the database names separated by space

-db "nr swissprot"
ADD REPLY
0
Entering edit mode

What if I want to blast against nt and nr?

ADD REPLY
0
Entering edit mode

You cannot. 'nt' is a nucleotide sequence database and 'nr' is a protein sequence database.

ADD REPLY
0
Entering edit mode

OK, I'll just do it separately. Thank you, Siva. You saved me.

ADD REPLY
1
Entering edit mode

I think you should use '-db nt' as it will recognize all the sub files of nt.

ADD REPLY

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6