Question

Getting Protein Information from NCBI Gene ID

0

Entering edit mode

9.7 years ago

joseph.orlando • 0

Hi there,

So I have several excel files with 3000+ 'feature ID's' from next gen sequencing experiments. The feature ID's look as such:

LOC733603
MS4A7
CRISP3
RETN
TNFAIP6
ALPL
MMP8
IRG1
LTF
KCNJ15
HCRTR1

Basically, I would like to gather the following information about each of these features for Sus scrofa:

Gene name
Gene description
Protein Name
Amino acid sequence

I am using python, mainly the urllib2 package, to make HTTP requests to the NCBI gene database.

I can easily get the gene name and gene description by querying NCBI's gene database. I am then trying to use the associated gene ID to query either NCBI's protein database or uniprot but I am not sure what is the wiser approach? Has anyone else had the same scenario and have any useful advice or other ways about obtaining the data I am interested in?

Even easier, is there a way to access the NCBI related protein information with an NCBI gene ID?

Joey

protein ncbi-id NCBI python gene • 4.7k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.7 years ago by joseph.orlando • 0

Ram · Answer 1 · 2015-05-04

2

Entering edit mode

9.7 years ago

Elisabeth Gasteiger ★ 2.4k

To obtain information corresponding to these gene symbols from UniProt, I recommend that you read this FAQ: http://www.uniprot.org/help/gene_symbol_mapping

Once you have your results, you can use the "Columns" button and customize your result table to include columns for gene and protein names and the amino acid sequence:

Query result in html view

Query result in tab-delimited format

Documentation about programmatic access to UniProt

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.7 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

This works perfectly! Thanks so much :)

ADD REPLY • link 9.6 years ago by joseph.orlando • 0

Ram · Answer 2 · 2015-05-04

1

Entering edit mode

9.7 years ago

Prash ▴ 280

Apart from the above suggestion, you could use Batch Entrez which could provide links for the ids that you upload: http://www.ncbi.nlm.nih.gov/sites/batchentrez

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.7 years ago by Prash ▴ 280

Ram · Answer 3 · 2015-05-04

0

Entering edit mode

9.7 years ago

cdsouthan ★ 1.9k

Yet another suggestion would be to approach the mapping via Ensembl pig http://www.ensembl.org/Sus_scrofa/Info/Annotation

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.7 years ago by cdsouthan ★ 1.9k