Hi eutils
/edirect
specialists,
I wonder whether NCBI's edirect
utils or eutils API
offer a more elegant, ideally one step solution to download genomes/assemblies based on a query, say all reference/representative genomes for Lactobacillus.
My current working solution is to use a pipe via esearch
| esummary
| xtract
to build tab- delimited output containing the ftp path, the accession and the name (and the species) which I pass through a perl command to build curl
commands.
esearch \
-db assembly \
-query "Lactobacillus[orgn] AND complete+genome[assembly+level] + latest[filter]" \
| esummary \
| xtract -pattern DocumentSummary \
-element FtpPath_RefSeq \
-element AssemblyAccession \
-element AssemblyName \
-element SpeciesName \
| perl -nwe 'chomp; @a = split(/\t/,$_); $a[3] =~ s/ /_/g; $g = $a[1] . "_" . $a[2] . "_genomic.fna.gz"; print "curl -L -o $a[3]_$g $a[0]/$g\n";' \
>lactobacillus_ftp.curl_commands.sh
My goal is to avoid the obscure perl
code - it is just difficult to hand over to beginners. Do I miss, something, maybe via elink
?
Thanks in advance
Download of genome data has been covered in past biostars threads (for future reference)
how to download all the complete genomes for mycobacteria from NCBI?
How to download COMPLETE bacterial genomes from NCBI based on list of names?
download refseq of thousand of assembly file from NCBI
Retrieve genome in fasta format from ncbi