Hello,
Has anyone an idea of wether there is some link to a website
telling me how many sequences
(as in entries) are currently to be found in the non redundant database
of NCBI (nr.gz from NCBI)?
I know I can let a bash-line
command run through the downloaded and unpacked db and count myself - but with about 2,000,000,000 lines that will take very long.
Now, creating an index with esl-sfetch
will also tell me how many entries are in nr.fa but the index creation is taking very long as well (SSI index written to file nr.fa.ssi):
esl-sfetch --index nr.fa
So yes, I am looking for an estimation of the number of entries in nr. Thanks for your help :)