I have some Ensembl ID's from an older assembly of the chicken genome, and I'm trying to get the promoter sequences for them. Is there a way to pull the sequences from Ensembl in Python with Biomart or something?
I have some Ensembl ID's from an older assembly of the chicken genome, and I'm trying to get the promoter sequences for them. Is there a way to pull the sequences from Ensembl in Python with Biomart or something?
You could use Ensembl's BioMart if the version you need is still available online otherwise, you'll need to download it from the ftp site and if the "something" covers perl, I suggest to use Ensembl's perl API.
You should probably take a look at pyensembl. This will allow you to pull all the genes for your IDs (along with their chromosome/start sites/strand). Then you can download the Chicken genome as a FASTA file and use pyfaidx to pull the sequence around the start site of each gene from that FASTA file for whatever you want to define as the promoter (2kb upstream of TSS or whatever). Sounds complicated, but it's actually pretty straightforward to implement, and likely quicker than trying to query Ensembl's API directly.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I just got it sort of working with Ensembl's REST API, but I have a couple thousand of genes, and it has a max size of 50 per post request :(
Which is why I almost never recommend a REST API. Use the perl API. If you're going to work with Ensembl a lot, the time invested into learning it is well spent.