I'm trying to figure out a programatic way of downloading genomes and their corresponding annotation files for a large number of species (100s).
I can't seem to find any reference for this in the Ensembl REST API docs. I suppose I could hack together something in Bash to wget
from the Ensembl FTP server but I'm wondering if there's a straightforward way that I'm missing.
Have you looked at NCBI's new DATASETS? There is a command line tool available as well.
This is interesting, and I had not heard of it—thanks! It still doesn't solve the issue, since I'd like to use Ensembl, but a good resource to be aware of.
I don't know if Ensembl API is designed to download genome wide data though I could be wrong. I will ping @Emily from Ensembl.
What information do you have about that species (Name/Accession number/Assembly id/Taxonomy ID...?) Does it have to be Ensemble or ncbi would work too? (ncbi ftp, efetch, esearch,...?)
This link may help
Cannot get efetch to download genome - what is wrong?
Currently I'm hoping to use binomial names, although I could probably use any identifier that would work programmatically.
Ensembl is the preferred source—I've used NCBI's utilities in the past which are much more robust, but the annotation pipeline at NCBI is more variable (in my experience), hence the desire for Ensembl's annotation standardization.