Question

retrieving entire genomic sequence contents of a database

0

Entering edit mode

3.4 years ago

saundythe8th • 0

Hi all, I'm trying to download all bacterial genomes from ensembl so I can further mine them for bacteriocin gene clusters. However I've been struggling and was hoping someone could advise? Any time I attempt the wget command on the index URL below I get results like "index.html". I've also tried things like wget ftp://ftp.bacteria.ensembl.org/species but no luck.

Can someone please advise on the steps I should take in order to be able to pull all genomic sequences from a database from the command line via ftp, preferably in gbk, gff, or fasta format.

Any help is greatly appreciated!

genomes mining database ftp ensembl • 880 views

ADD COMMENT • link updated 3.4 years ago by Ben Moore ★ 2.4k • written 3.4 years ago by saundythe8th • 0

score 1 · Answer 1 · 2021-07-15

There is a script here that does massive genome data download, but from NCBI. For example, this command will download all RefSeq complete bacterial genomes:

genome_updater.sh -g "bacteria" -d "refseq" -l "Complete Genome" -f "genomic.fna.gz" -o "bac_refseq" -t 20

A small command-line change will let you download all GenBank genomes if you wish, and include even those (meta)genomes that may not be complete.

score 0 · Answer 2 · 2021-07-20

Hi sandrewsaunderson,

If you are keen to use Ensembl for this task, it's important to remember that the bacterial files are stored in collections on the FTP site. E.g: http://ftp.ensemblgenomes.org/pub/bacteria/release-51/fasta/

This may be where you have encountered problems with your download.

Best wishes

Ben Ensembl Helpdesk