Entering edit mode
7.9 years ago
johnsrc06
▴
10
I've been trying to find an EASY way to download all genomes (fasta, genbank, gff, etc.) from NCBI's refseq or genbank. I decided to write my own program in python to help make the process much easier and flexible for researchers. Let me know if this helps you or if you have any suggestions:
WouterDeCoster : This could be left classified as a
tool
as the OP had done. It is a ready to use script that anyone is able to use as is.Oh yeah you are right, guess I'm getting used to adjusting post classification but should've read more carefully here.
Perhaps this tool does what you are looking for: https://github.com/kblin/ncbi-genome-download
johnsrc06 : I have not tried your script but you should add some notes about how long it takes to run (since you appear to be parsing the actual directories, is that correct?).
Thanks for the suggestions (I'm new to Biostars)...I've changed it to tool and will run a few tests to add some details about time consumption.
Just a thought. You may want to consider parsing the genome summary file that lists contents of RefSeq genomes (it is at the ftp site). It may be faster than parsing the directories.
I had a look at your code and it sure looks decent. I tried your example usage from Github and it did the job here. If you would rewrite
print "something"
toprint("something")
your code would also be compatible with python3.Good suggestion...i'll put it on my to-do list. Thanks!
Can you amend this tool easily to allow for downloads of whole genome protein sequence files? I noticed that currently you only allow genbank/fasta/gff files.