How to know how many genomes are there in genbank?
2
0
Entering edit mode
8.6 years ago
lizabe ▴ 10

Hello, I want to know how many genomes there are of a specific genus in genbank, for example Enterobacter. Where do I have to look for? in assembly or in genome?

http://www.ncbi.nlm.nih.gov/gquery/?term=enterobacter

Thanks.

genbank • 1.7k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
5heikki 11k

Assembly summary file is very informative. If the 8th field of said field begins like "Enterobacter<space>"..

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt
awk 'BEGIN{FS="\t"; count=0}{if($8~/^Enterobacter /){count++}}END{print count}' assembly_summary_genbank.txt
1103
ADD COMMENT
0
Entering edit mode
8.6 years ago
lizabe ▴ 10

Thanks for the answer!! Why the result of the script (1103) is different from the number that appears in assembly in this page(1136) ? http://www.ncbi.nlm.nih.gov/gquery/?term=enterobacter

ADD COMMENT
0
Entering edit mode

If you follow the assembly link it says on top:

Items: 1 to 20 of 1105

Filters activated: Latest, Exclude anomalous. Clear all to show 1136 items.

The assembly summary file only includes the latest assembly versions. The awk command for the latest assembly summary file returns 1104 hits so it lags a little bit behind entrez (I imagine because they don't update the file every second).

ADD REPLY

Login before adding your answer.

Traffic: 1863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6