Hello!
I have a list of gene IDs obtained from multiple sources via the OMA API. My goal is to download the gene sequences along with their genomic context. To achieve this, I need to acquire the accession numbers and coordinates of the assemblies in which they are annotated. I have successfully accomplished this for most of the genes using Entrez Direct. However, I've encountered that for some reason, certain genes from Ensembl cannot be retrieved through this method. As a solution, I plan to identify catch these cases and perform a secondary search using the Ensembl REST API.
Now for example, this is the case for the gene ENSAPLG00000012763
$ esearch -db gene -query ENSAPLG00000012763
Would produce no results.
If you look it up in ENSEMBL, it shows the gene location:
Primary_assembly 21: 7,324,855-7,330,947 reverse strand. CAU_duck1.0:CM008557.1
My goal is to get this chromosome accession (CM008557.1) using the ENSEMBL Rest API with the lookup endpoint:
https://rest.ensembl.org/lookup/id/ENSAPLG00000012763?content-type=application/json;expand=1
But this produces the following output, without the accession number:
{"end":7330947,"db_type":"core","logic_name":"ensembl","id":"ENSAPLG00000012763","assembly_name":"CAU_duck1.0","source":"ensembl","object_type":"Gene","seq_region_name":"21","canonical_transcript":"ENSAPLT00000013288.2","version":2,"description":"AAR2 splicing factor [Source:HGNC Symbol;Acc:HGNC:15886]","start":7324855,"biotype":"protein_coding","strand":-1,"species":"anas_platyrhynchos_platyrhynchos","display_name":"AAR2"}
Is there anyway of getting the accession of the chromosome using this API? Do you know/suggest any better way to achive this?
Thanks!
It's EnsEMBL, not ENSAMBL or Ensemble or Ensamble. I've fixed it for you this time.