Question

retrieve matched genome/annotation pairs using Ensembl API

0

Entering edit mode

4.1 years ago

glarue ▴ 70

I'm trying to figure out a programatic way of downloading genomes and their corresponding annotation files for a large number of species (100s).

I can't seem to find any reference for this in the Ensembl REST API docs. I suppose I could hack together something in Bash to wget from the Ensembl FTP server but I'm wondering if there's a straightforward way that I'm missing.

ensembl REST genome • 1.1k views

ADD COMMENT • link updated 4.1 years ago by Emily 24k • written 4.1 years ago by glarue ▴ 70

1

Entering edit mode

Have you looked at NCBI's new DATASETS? There is a command line tool available as well.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

This is interesting, and I had not heard of it—thanks! It still doesn't solve the issue, since I'd like to use Ensembl, but a good resource to be aware of.

ADD REPLY • link 4.1 years ago by glarue ▴ 70

0

Entering edit mode

I don't know if Ensembl API is designed to download genome wide data though I could be wrong. I will ping @Emily from Ensembl.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

What information do you have about that species (Name/Accession number/Assembly id/Taxonomy ID...?) Does it have to be Ensemble or ncbi would work too? (ncbi ftp, efetch, esearch,...?)

This link may help

Cannot get efetch to download genome - what is wrong?

ADD REPLY • link 4.1 years ago by Fatima ▴ 1000

0

Entering edit mode

Currently I'm hoping to use binomial names, although I could probably use any identifier that would work programmatically.

Ensembl is the preferred source—I've used NCBI's utilities in the past which are much more robust, but the annotation pipeline at NCBI is more variable (in my experience), hence the desire for Ensembl's annotation standardization.

ADD REPLY • link 4.1 years ago by glarue ▴ 70

score 2 · Accepted Answer · 2020-10-20

The Ensembl REST API is not designed for anything like that. It should be relatively easy to use the standard paths on the FTP site to script a wget download. You may find it useful to use the info/genomes/division endpoint from the REST API to get the genome names etc that you need in the FTP site locations though.