NCBI protein IDs convert to gene IDs and then gene sequences
1
1
Entering edit mode
10.2 years ago
biolab ★ 1.4k

Hi everyone,

I have many NCBI Ref protein IDs (e.g. NP_051085.1), I need to convert them to gene IDs, and then extract gene sequences. I searched NCBI ftp website, but could not find proper links. So I need two dbs: one is Ref protein id vs Ref gene id, the other is Ref gene sequence fasta file.

Could anyone help to provide me a link or other method to achieve this? I will much appreciate your kind helps. THANKS.

ncbi id • 14k views
ADD COMMENT
1
Entering edit mode

I think you can directly use NCBI batch retrieve to retrieve your sequences when you have Protein ID's or accessions. Check here. Then you can download the sequences as fasta file from NCBI.

ADD REPLY
0
Entering edit mode

Hi Prakki, thanks for your help! However, I need to get the gene sequences instead of protein sequences. Need further helps. THANKS.

ADD REPLY
1
Entering edit mode

Oh. Ok then. Try using some converter like Biodbnet to convert to refseq nucleotide accession and try the batch retrieval. Some more ID converters are mentioned here also.

ADD REPLY
0
Entering edit mode

Hi Prakki, thanks a lot. Your comments are really helpful.

ADD REPLY
3
Entering edit mode
6.2 years ago

Hello,

here is an up-to-date solution:

This can be done quite easily with Ensembl's biomart. In your case got the biomart version for plants.

  • in Dataset choose Ensembl Plants Genes 40
  • in Filters open Gene and select Input external references ID list [Max 500 advised]
  • in the dropdown choose RefSeq peptide ID(s) and paste in your ID(s) (one per line
  • in Attributes click the radio button Sequences and select Unspliced (Gene) in the Sequences area
  • click on Results and you are done

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6