Hi there,
So I have several excel files with 3000+ 'feature ID's' from next gen sequencing experiments. The feature ID's look as such:
LOC733603
MS4A7
CRISP3
RETN
TNFAIP6
ALPL
MMP8
IRG1
LTF
KCNJ15
HCRTR1
Basically, I would like to gather the following information about each of these features for Sus scrofa:
- Gene name
- Gene description
- Protein Name
- Amino acid sequence
I am using python, mainly the urllib2 package, to make HTTP requests to the NCBI gene database.
I can easily get the gene name and gene description by querying NCBI's gene database. I am then trying to use the associated gene ID to query either NCBI's protein database or uniprot but I am not sure what is the wiser approach? Has anyone else had the same scenario and have any useful advice or other ways about obtaining the data I am interested in?
Even easier, is there a way to access the NCBI related protein information with an NCBI gene ID?
Joey
This works perfectly! Thanks so much :)