From accesion number to assemblies *.faa and *.pff
1
0
Entering edit mode
8.8 years ago

Hello everyone,

I am interested if there is an automated way to download all the assemblies found in the NCBI ftp server folder for a list of accession numbers.

For example, to go from this:

http://www.ncbi.nlm.nih.gov/nuccore/NC_010473

to this on the ftp server and download the file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Escherichia_coli/latest_assembly_versions/GCF_000019425.1_ASM1942v1/

in a automated way. Or if I download the file from NCBI Reference Sequence everything is the same?

Thanks a lot!

genome • 2.1k views
ADD COMMENT
2
Entering edit mode
8.8 years ago
5heikki 11k

You can e.g. grep your accessions from this file to get the base urls (column 20) and then:

for next in $(awk -F '\t' '{print $20}' grepdFile); do wget "$next"/*.faa.gz; done
ADD COMMENT
0
Entering edit mode

Does this means I have to download the whole database to my computer?

ADD REPLY
2
Entering edit mode

5heikki says that if you have all the proper assembly accession id's (which can be retrieved through various ways), you can grep for them in that list he/she linked, upon which you have the full link to the files through the ftp.

It's just a matter of grabbing all the assembly accessions (for example, through eutils) and then loop through them all, grepping for the proper file locations, and finally downloading them.

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6