How to download all the bacteria genomes from NCBI?
1
0
Entering edit mode
4.1 years ago
anran04100 • 0
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

I download the assembly_summary.txt

awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary_path.txt > assembly_summary_complete_genomes_path.txt

select Complete Genome to a new file which save the path of bacteria

but it turns out that the path file includes 21272 rows I wonder if there should be 3000+ rows since there are 3000+ bacteria in NCBI What's wrong with it? How can I download all the bacteria genomes from NCBI?

Thanks!

genome • 1.7k views
ADD COMMENT
1
Entering edit mode
4.1 years ago
GenoMax 147k

Use ncbi-genome-download tool from Kai Blin. You probably should look at the assembly levels (complete) and perhaps RefSeq category to download only complete, high-quality genomes. There is a lot of redundancy within species because of strains etc.

ADD COMMENT
0
Entering edit mode

Right, the tool is great!

ADD REPLY

Login before adding your answer.

Traffic: 2850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6