Entering edit mode
7.7 years ago
rlee03
•
0
Hi,
I tried creating a Fasta Database using an Entrez Query but I consistent get booted from the NCBI server due to the large request. Can you recommend another way to download all the proteins bacteria in a fasta file that should work? n=5.8 million
Thanks!
I'm a python programmer and prefer to do this via the command line
You can download many (all?) bacterial genomes here: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria Getting protein sequences from bacterial genomes is most probably trivial for you ...
You could get all RefSeq proteins by downloading the
faa
protein files here.Get the protein files from folder hierarchy from link posted by @Protostome: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/name_0f_bacteria/latest_assembly_versions/GCA_*/*faa.gz files.
How to download all the parasite protein data from NCBI?