Number of Entries in Non Redundant DB
2
0
Entering edit mode
3.9 years ago
6schulte ▴ 30

Hello,

Has anyone an idea of wether there is some link to a website telling me how many sequences (as in entries) are currently to be found in the non redundant database of NCBI (nr.gz from NCBI)?

I know I can let a bash-line command run through the downloaded and unpacked db and count myself - but with about 2,000,000,000 lines that will take very long. Now, creating an index with esl-sfetch will also tell me how many entries are in nr.fa but the index creation is taking very long as well (SSI index written to file nr.fa.ssi):

esl-sfetch --index nr.fa

So yes, I am looking for an estimation of the number of entries in nr. Thanks for your help :)

nr ncbi • 970 views
ADD COMMENT
3
Entering edit mode
3.9 years ago
GenoMax 147k

greping (or zgreping compressed file) for ^> in fasta files should get you number of unique sequences. Since some sequences refer to multiple entries you would not count the exact number of accessions. If you need that information then you will need to parse the fasta headers (an example below).

>MBD3193859.1 hypothetical protein [Candidatus Lokiarchaeota archaeon]MBD3198741.1 hypothetical protein [Candidatus Lokiarchaeota archaeon]

$ zgrep "^>" nr.gz | wc -l
338057725

If you have nr blast indexes available then following would be another option (as of this week).

$ blastdbcmd -db nr -entry all -outfmt %a | wc -l
593806742

Looking at the results above it would appear that there are 338057725 unique sequences representing a total of 593806742 accessions.

Edit: @Mensur's method is simple to follow and can get you an updated number of unique sequences (but not accessions) for each day.

Note: Number of entries in nr likely change each day as the indexes are regenerated.

ADD COMMENT
3
Entering edit mode
3.9 years ago
Mensur Dlakic ★ 28k

Do a protein BLAST search and the result page will have a pull-down menu next to nr where you can show database details. As of yesterday the nr has 338057725 sequences.

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 2622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6