Question

Can you use Python to get download flanking sequences of genes from Ensembl's older releases/assemblies?

0

Entering edit mode

7.2 years ago

ericbrenner • 0

I have some Ensembl ID's from an older assembly of the chicken genome, and I'm trying to get the promoter sequences for them. Is there a way to pull the sequences from Ensembl in Python with Biomart or something?

python ensembl genome sequence • 2.7k views

ADD COMMENT • link updated 7.2 years ago by jared.andrews07 ★ 18k • written 7.2 years ago by ericbrenner • 0

0

Entering edit mode

7.2 years ago

jared.andrews07 ★ 18k

You should probably take a look at pyensembl. This will allow you to pull all the genes for your IDs (along with their chromosome/start sites/strand). Then you can download the Chicken genome as a FASTA file and use pyfaidx to pull the sequence around the start site of each gene from that FASTA file for whatever you want to define as the promoter (2kb upstream of TSS or whatever). Sounds complicated, but it's actually pretty straightforward to implement, and likely quicker than trying to query Ensembl's API directly.

ADD COMMENT • link 7.2 years ago by jared.andrews07 ★ 18k

score 2 · Accepted Answer · 2017-09-20

2

Entering edit mode

7.2 years ago

Jean-Karim Heriche 27k

You could use Ensembl's BioMart if the version you need is still available online otherwise, you'll need to download it from the ftp site and if the "something" covers perl, I suggest to use Ensembl's perl API.

ADD COMMENT • link 7.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I just got it sort of working with Ensembl's REST API, but I have a couple thousand of genes, and it has a max size of 50 per post request :(

ADD REPLY • link 7.2 years ago by ericbrenner • 0

2

Entering edit mode

Which is why I almost never recommend a REST API. Use the perl API. If you're going to work with Ensembl a lot, the time invested into learning it is well spent.

ADD REPLY • link 7.2 years ago by Jean-Karim Heriche 27k