Fetching GO terms based on gene IDs using Entrez e-utilities
1
0
Entering edit mode
3.0 years ago
Shraddha ▴ 90

Hello all,

I'm trying to access GO terms for a list of genes, but they're not easily available. Manually, these are the steps I'd take:

  1. Search gene ID on NCBI; find protein ID
  2. Search protein ID on Uniprot/Uniparc; find Pfam ID
  3. Search Pfam ID on Pfam; switch to InterPro tab, get GO terms

I have seen the e-utilities functionality by Entrez , but I'm having trouble understanding how to use it. When I try a command such as

esearch -db gene -query 'LOC115705987' | elink -target protfam | esearch -db protein -query [PROT]

I get a response like

<ENTREZ_DIRECT>
  <Db>protein</Db>
  <WebEnv>MCID_61e013e99059043a7d58ece9</WebEnv>
  <QueryKey>3</QueryKey>
  <Count>1001315687</Count>
  <Step>3</Step>
</ENTREZ_DIRECT>

My question is, how can I use this information to get (at first) the protein ID, and subsequently the GO terms?

Thanks in advance!

unix entrez e-utilities • 776 views
ADD COMMENT
1
Entering edit mode
3.0 years ago
GenoMax 148k

how can I use this information to get (at first) the protein ID, and subsequently the GO terms?

Using EntrezDirect:

$ esearch -db gene -query 'LOC115705987' | elink -target protein | esummary | xtract -pattern DocumentSummary -element Caption,Title
XP_030489337    probable serine/threonine-protein kinase At1g09600 [Cannabis sativa]

This is a predicted record so information here on out may not be available. There are references to CDD that may be useful.

$ esearch -db gene -query 'LOC115705987' | elink -target protein | efetch -format ft
>Feature ref|XP_030489337.1|
1   234 Protein
            product probable serine/threonine-protein kinase At1g09600
<11 233 Region
            region  PKc_like
            note    Protein Kinases, catalytic domain
            db_xref CDD:389743
37  37  Site
75  75
77  77
96  96
109 109
111 114
116 116
154 155
            site_type   other
            note    polypeptide substrate binding site [polypeptide binding]
            db_xref CDD:270870
92  103 Site
108 116
            site_type   other
            note    activation loop (A-loop)
            db_xref CDD:270870
1   234 CDS
            product probable serine/threonine-protein kinase At1g09600
            protein_id  ref|XP_030489337.1|
            db_xref GeneID:115705987
ADD COMMENT
0
Entering edit mode

Thank you! Do you know if uniprot, pfam, and interpro can also be accessed through this? Here I don't see any corresponding databases.

ADD REPLY

Login before adding your answer.

Traffic: 2243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6