Uncommon wget behavior with ncbi genomes
1
Hey guys, I'm trying to download all genomes of Eimeria, present on NCBI. So, as usually, I wrote this line:
wget -r --accept-regex ".*_genomic.fna.gz" "ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/protozoa/Eimeria*" -P .
But, none file was returned (I tried with -A insetead of --accept-regex too), but the same occurs.
I'm using GNU Wget 1.17.1, and other times I already download several genomes with the same line.
this is an example of a link ftp directory with the file that I want.
https://ftp.ncbi.nlm.nih.gov/genomes/refseq/protozoa/Eimeria_necatrix/latest_assembly_versions/GCF_000499385.1_ENH001/
the file in the case is: GCF_000499385.1_ENH001_genomic.fna.gz
Can anyone help ?
ncbi
wget
genomes
• 1.4k views
You can use Entrez Direct for this as shown below:
esearch -db assembly -query 'Eimeria necatrix[organism]' \
| esummary \
| xtract -pattern DocumentSummary -element FtpPath_RefSeq \
| while read -r url ; do
path= $( echo $url | perl -pe 's/( GC[ FA] _\d+.*) /\1\/\1_genomic.fna.gz/g') ;
wget -q --show-progress "$path " -P genome_data ;
done
Alternatively, you can go to the NCBI Assembly portal, search for Eimeria necatrix[ organism]
and use the blue 'Download Assemblies' button to download the files of your choice.
Login before adding your answer.
Traffic: 3873 users visited in the last hour