Downloading the nucleotide and protein sequences from ENSEMBL IDs
1
0
Entering edit mode
4.7 years ago
botloggy ▴ 10

I have several ENSEMBL IDs for which I am not able to find the corresponding NCBI Gene IDs. Is there any way to find the NCBI Gene IDs ?. I want to extract the nucleotide and amino acid sequences for those IDs.

For example 1, when I searched the following IDs in ENSEMBL website.

ENSCCRG00010004296
ENSCSEG00000016380
ENSCCRG00010032853
ENSEBUG00000005910
ENSGAFG00000018062

For all the above ENSEMBL Gene IDs, I got the same output shown below

Source:NCBI gene;Acc:334648

Similarly, I have some other IDs, when I enter the following IDs in ENSEMBL website

ENSETEG00000011291
ENSEASG00005014496
ENSEEUG00000002917

For all the above ENSEMBL Gene IDs, I got the same output shown below

HGNC:18416

I am not sure how to extract the nucleotide and protein sequences for these ENSEMBL IDs

gene alignment genome sequence • 1.3k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
GenoMax 147k
ENSETEG00000011291  - Echinops_telfairi
ENSEASG00005014496  - Donkey
ENSEEUG00000002917 - Hedgehog

Those are orthologs of FICD human gene which you have in HGNC:18416. Even the list up at the top seems to be similar case of orthologs from other species.

Here is the complete list from NCBI Homologene. Click the genes you need or select the entire set and download the protein sequence.

If you want to get them from Ensembl then you will need to try BioMart.

ADD COMMENT
0
Entering edit mode

Could you please tell me if there is there any way I can find the sequence information for these? I tried biomart. But it did not work

ADD REPLY
1
Entering edit mode

Thinking about this some more, there is no easy way to do this using BioMart since BioMart allows one to select only one species at a time and these identifiers are all from different species. You could try using Ensembl API.

https://rest.ensembl.org/sequence/id/ENSEASG00005014496%20?content-type=text/cds
https://rest.ensembl.org/sequence/id/ENSEBUG00000005910?content-type=text/cds

I think you are best off getting the sequence from NCBI's homologene page after you decide which species you are interested in from Ensembl. Here is the complete list Ensembl species prefixes.

ADD REPLY
0
Entering edit mode

Thank you for the suggestions on NCBI homologene and Ensembl species prefixes website. I found some from this website OMAbrowser.org ENSGMOG00000017853. But not sure If those are correct information.

ADD REPLY
0
Entering edit mode

I tried using the ENSEMBL API For the ID ENSAPOG00000007174 here. There was NCBI Gene ID 110954326 found for the ENSEMBL ID here and the FASTA sequence from NCBI here. However, the sequences obtained from NCBI and ENSEMBL API do not match. Could you tell me if am doing something wrong? @genomax. Thank you.

ADD REPLY
1
Entering edit mode

They are same sequence. See this blast2sequence result. Ensembl sequence may have 3'-UTR.

ADD REPLY

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6