Download all refseq/genbank bacterial genomes from NCBI

Tool:Download all refseq/genbank bacterial genomes from NCBI

Entering edit mode

7.9 years ago

johnsrc06 ▴ 10

I've been trying to find an EASY way to download all genomes (fasta, genbank, gff, etc.) from NCBI's refseq or genbank. I decided to write my own program in python to help make the process much easier and flexible for researchers. Let me know if this helps you or if you have any suggestions:

https://github.com/ryjohnson09/bacteria_genome_pull

genome sequencing • 3.8k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 7.9 years ago by johnsrc06 ▴ 10

Entering edit mode

WouterDeCoster : This could be left classified as a tool as the OP had done. It is a ready to use script that anyone is able to use as is.

ADD REPLY • link 7.9 years ago by GenoMax 148k

Entering edit mode

Oh yeah you are right, guess I'm getting used to adjusting post classification but should've read more carefully here.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

Entering edit mode

Perhaps this tool does what you are looking for: https://github.com/kblin/ncbi-genome-download

ADD REPLY • link 7.9 years ago by Jon ▴ 360

Entering edit mode

johnsrc06 : I have not tried your script but you should add some notes about how long it takes to run (since you appear to be parsing the actual directories, is that correct?).

ADD REPLY • link 7.9 years ago by GenoMax 148k

Entering edit mode

Thanks for the suggestions (I'm new to Biostars)...I've changed it to tool and will run a few tests to add some details about time consumption.

ADD REPLY • link 7.9 years ago by johnsrc06 ▴ 10

Entering edit mode

Just a thought. You may want to consider parsing the genome summary file that lists contents of RefSeq genomes (it is at the ftp site). It may be faster than parsing the directories.

ADD REPLY • link 7.9 years ago by GenoMax 148k

Entering edit mode

I had a look at your code and it sure looks decent. I tried your example usage from Github and it did the job here. If you would rewrite print "something" to print("something") your code would also be compatible with python3.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

Entering edit mode

Good suggestion...i'll put it on my to-do list. Thanks!

ADD REPLY • link 7.9 years ago by johnsrc06 ▴ 10

Entering edit mode

Can you amend this tool easily to allow for downloads of whole genome protein sequence files? I noticed that currently you only allow genbank/fasta/gff files.

ADD REPLY • link 7.8 years ago by GenoMax 148k