I know that this question is already 4 years old, but I hope that my answer might be useful to others anyway.
I implemented a standardized way to automate the genome retrieval process in R (see biomartr package).
To retrieve all bacterial reference genomes from several database sources one can simply type:
# download all bacterial reference genomes from NCBI RefSeq
biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")
or
# download all bacterial reference genomes from NCBI Genbank
biomartr::meta.retrieval(kingdom = "bacteria", db = "genbank", type = "genome")
Alternatively, you can also specify: type = "proteome", type = "CDS" (coding sequence) or type = "gff".
For more details about downloading specific genomes from specific kingdoms or subkingdoms of life please consult the Meta-Genome Retrieval vignette.
Please note that to promote computational reproducibility in genomics and metagenomics studies, biomartr stores log files for each downloaded genome, proteome, or CDS file.
An example log file looks as follows:
File Name: Escherichia_coli_genomic_refseq.fna.gz
Organism Name: Escherichia_coli
Database: NCBI refseq
URL:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
Download_Date: Wed Feb 15 15:17:50 2017
refseq_category: reference genome
assembly_accession: GCF_000005845.2
bioproject: PRJNA57779
biosample: SAMN02604091
taxid: 511145
infraspecific_name: strain=K-12 substr. MG1655
version_status: latest
release_type: Major
genome_rep: Full
seq_rel_date: 2013-09-26
submitter: Univ. Wisconsin
I hope this helps.
A lot of genomes don't have any data. Look at the Chr column in the table, if there is no number then no sequence is available.
Hi,
How many sequences are you getting with this wget command? On the mentioned link only 2379 of bacterial species have genomic DNA. Click on the "Download selected records" and use awk -F"\t" '$5>0' genomes_overview.txt | wc -l.
Best wishes, Rahul
thanks for responding. yes that's right it gives 2379 but i can only download 2258 with the above mentioned command.
Hi everybody,
i'm looking to download all complete bacterial genomes. There's a option with http://www.ncbi.nlm.nih.gov/genome/browse/ to show only complete prokaryotic genomes (3243) , and i'm interested in downloading just these. ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/ provides the possibility to download everything, but thats not what i'm looking for. DOes someone know a possibility for that ? Thank you all !
How complete are these > 15.000 genomes ? And is there a possibility provided to download all genomes in FAST(DNA) format with one click ?
They are annotated in the INSDC (e.g ENA, European Nucleotide Archive) as a containing the full genome representation with cds annotations for example. You may want to contact ENA for further details on completeness. To download all in one go try wget on ftp://ftp.ensemblgenomes.org/pub/current/bacteria/fasta.