Question

How to download all Pseudomonas aeruginosa Genomes from NCBI Genomes database?

0

Entering edit mode

4.6 years ago

Optimist ▴ 190

Hello All,

I want to download all the Genomes of Pseudomonas aeruginosa from NCBI genomes database. As of now (23/10/2020), there are 5556 genomes for species Pseudomonas aeruginosa.

Kindly let me know a way to download all of them. Preferably with strain name .

Thanking You

NCBI Genome-Assembly • 2.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 4.6 years ago by Optimist ▴ 190

GenoMax · Answer 1 · 2020-10-23

1

Entering edit mode

4.6 years ago

shenwei356 8.7k

https://github.com/kblin/ncbi-genome-download
https://github.com/pirovc/genome_updater

ADD COMMENT • link updated 4.2 years ago by GenoMax 151k • written 4.6 years ago by shenwei356 8.7k

GenoMax · Answer 2 · 2020-10-23

1

Entering edit mode

4.6 years ago

vkkodali_ncbi ★ 3.8k

You can download these data directly from NCBI using the Datasets tool. Check out: NCBI Datasets for more details.

ADD COMMENT • link updated 4.6 years ago by GenoMax 151k • written 4.6 years ago by vkkodali_ncbi ★ 3.8k

1

Entering edit mode

Note: Web interface for NCBI datasets only provides access to Eukaryotic genomes. Use command line option for all genomes including bacteria.

ADD REPLY • link 4.6 years ago by GenoMax 151k

3

Entering edit mode

NCBI Datasets now provides access to data for viruses and prokaryotes, including Pseudomonas aeruginosa.

While our Genomes page is limited to a maximum of 1,000 genomes for a single download, you can use the datasets command-line tool to download 15,365 Pseudomonas aeruginosa genomes.

Since this is such a large dataset, at about 30 GB compressed for genome sequence and metadata, I recommend you try this simple three-step approach:

Download a dehydrated data package for all Pseudomonas aeruginosa genomes, including genome sequence and metadata. This only includes metadata.

datasets download genome taxon "pseudomonas aeruginosa" --exclude-genomic-cds --exclude-protein --exclude-gff3 --filename aeruginosa.zip --dehydrated
Extract the downloaded package.

unzip aeruginosa.zip -d aeruginosa
Rehydrate the extracted package to get the genomic sequences.

datasets rehydrate --directory aeruginosa/

ADD REPLY • link 3.5 years ago by EricC_NCBI ▴ 30