Hi all, I have downloaded the whole NT database locally for running BLAST. During my search, I miss some sequences in the local NT database but are found from NCBI website. These are some of the accessions which could not be found in local NT:
Error: [blastdbcmd] Entry not found: NC_019090.1
Error: [blastdbcmd] Entry not found: NC_019424.1
Error: [blastdbcmd] Entry not found: NZ_CP021711.1
Error: [blastdbcmd] Entry not found: NZ_CP021210.1
Error: [blastdbcmd] Entry not found: NC_020278.2
Error: [blastdbcmd] Entry not found: NC_019095.1
Error: [blastdbcmd] Entry not found: NZ_CP029734.1
Error: [blastdbcmd] Entry not found: NZ_CP016389.1
Error: [blastdbcmd] Entry not found: NZ_CP029974.1
Error: [blastdbcmd] Entry not found: NZ_CP015072.1
Error: [blastdbcmd] Entry not found: NZ_CP007652.1
Error: [blastdbcmd] Entry not found: NC_024954.1
Error: [blastdbcmd] Entry not found: NC_019163.1
Error: [blastdbcmd] Entry not found: NZ_CP024879.1
Error: [blastdbcmd] Entry not found: NC_015872.1
Error: [blastdbcmd] Entry not found: NZ_CP016037.1
Error: [blastdbcmd] Entry not found: NZ_CP010880.1
Doubting if I had downloaded all the files, I checked the downloaded file number (nt.1..nt.60) and confirmed with my .nal
output which looks like this:
$ cat nt.nal
#
# Alias file created 08/08/2018 12:50:38
#
TITLE Nucleotide collection (nt)
DBLIST "nt.00" "nt.01" "nt.02" "nt.03" "nt.04" "nt.05" "nt.06" "nt.07" "nt.08" "nt.09" "nt.10" "nt.11" "nt.12" "nt.13" "nt.14" "nt.15" "nt.16" "nt.17" "nt.18" "nt.19" "nt.20" "nt.21" "nt.22" "nt.23" "nt.24" "nt.25" "nt.26" "nt.27" "nt.28" "nt.29" "nt.30" "nt.31" "nt.32" "nt.33" "nt.34" "nt.35" "nt.36" "nt.37" "nt.38" "nt.39" "nt.40" "nt.41" "nt.42" "nt.43" "nt.44" "nt.45" "nt.46" "nt.47" "nt.48" "nt.49" "nt.50" "nt.51" "nt.52" "nt.53" "nt.54" "nt.55" "nt.56" "nt.57" "nt.58" "nt.59" "nt.60"
NSEQ 49266009
LENGTH 188943333900
I randomly checked the md5sums also of NT files, and they found to be same with md5sums available in the NCBI FTP page. Am I missing something here? Many thanks for your comments in advance.
nr/nt (the one on NCBI website) is not the same database as nt which you can download from their ftp..
I assumed they are same.
makes two of us
and what's the difference then?
and
That
nr
definition is for the protein db. I don't know what exactly is different betweennr nt
(the one on the website) andnt
(the one of the ftp), but right nownr nt
has48,336,722 seqs
whereas OP'snt
is slightly larger with49,266,009 seqs
. I tried a few identifiers from OP and they were all RefSeq sequences. Could it be that those seqs are innt
but not with the RefSeq identifiers but GenBank identifiers, e.g. fromNZ_CP016037.1
toCP016037.1
, fromNC_019095.1
toJF927996.1
, etc.Edit. like the README states
OK, right, never noticed that before but indeed it says
nr/nt
for the non-redundant DB in blastnfrom what I can see it's a different 'state' of non-redundancy :
from the ftp README:
from the NCBI blastn page:
Though I totally agree this even adds the confusion
ok, yes, that I know.
I thought the statement was that the nr (or nt) available from the ftp is different then the one from the ncbi blast page itself
Actually, this is not straight forward. Downloading large files is timing out. So, for now, I just used NCBI eutils to download my sequence from NCBI online:
This also has problem sometimes downloading the sequence if the internet connection is slightly off.
You are in Singapore so bandwidth should not be an issue unless you are behind some restrictive scanning firewall.