How To Get The Go Associated With A Protein?
2
4
Entering edit mode
14.1 years ago
Sirus ▴ 820

Hello every body, I am computer scientist and I have started working on the bioinformatics field but I am having trouble finding resources. So my problem is as follow, I have a protein-protein interaction network and I want to find for each protein the list of protein associated with it, I have seen that Bioconductor Packages have tools that can help calculate it. Any one has an idea how to do it? Thank you in advance

gene ppi bioconductor r • 3.9k views
ADD COMMENT
9
Entering edit mode
14.1 years ago
Neilfws 49k

If you have a list of identifiers (such as protein sequence IDs), you want another list of identifiers (such as GO terms) and you're working with a commonly-used organism (such as humans), then BioMart is a good option.

The IDs that you have are termed "filters", those that you want are termed "attributes". You can use BioMart via the web interface and there are also other tools; in particular the R biomaRt package.

Here's a brief example, showing how you could connect human protein HGNC symbols with GO:

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
hgnc <- c("EPB41L3", "RAB31", "TUBB6", "ADAMTS1", "CFD", "CLDN8")
# query biomart
results <- getBM(attributes = c("hgnc_symbol", "go_biological_process_id"), filters = "hgnc_symbol", values = hgnc, mart = mart)
results

   hgnc_symbol go_biological_process_id
1      EPB41L3               GO:0008150
2      EPB41L3               GO:0030866
3        RAB31               GO:0015031
4        RAB31               GO:0007264
5        RAB31               GO:0048193
6        RAB31               GO:0006886
7        RAB31               GO:0006913
8        RAB31               GO:0007165
9        TUBB6               GO:0007018
10       TUBB6               GO:0051258
11       TUBB6               GO:0007017
12     ADAMTS1               GO:0006508
13     ADAMTS1               GO:0001542
14     ADAMTS1               GO:0060347
15     ADAMTS1               GO:0007229
16     ADAMTS1               GO:0001822
17     ADAMTS1               GO:0008285
18     ADAMTS1                         
19       CLDN8               GO:0016338
20         CFD               GO:0006957
21         CFD               GO:0006956
22         CFD               GO:0006508
23         CFD               GO:0007219

You can use the biomaRt functions listAttributes() and listFilters() to see the available options. For example, to see the attributes related to GO:

a <- listAttributes()
a[grep("GO", a$description),]
24                  go_biological_process_id     GO Term Accession (bp)
25                                 name_1006          GO Term Name (bp)
26                           definition_1006    GO Term Definition (bp)
27        go_biological_process_linkage_type GO Term Evidence Code (bp)
28                  go_cellular_component_id     GO Term Accession (cc)
29       go_cellular_component__dm_name_1006          GO Term Name (cc)
30 go_cellular_component__dm_definition_1006    GO Term Definition (cc)
31        go_cellular_component_linkage_type GO Term Evidence Code (cc)
32                  go_molecular_function_id          GO Term Accession
33       go_molecular_function__dm_name_1006          GO Term Name (mf)
34 go_molecular_function__dm_definition_1006    GO Term Definition (mf)
35        go_molecular_function_linkage_type GO Term Evidence Code (mf)
36                      goslim_goa_accession    GOSlim GOA Accession(s)
37                    goslim_goa_description     GOSlim GOA Description
ADD COMMENT
0
Entering edit mode

Thank you, it seems that it is the answer that I was looking for, I will try it.

ADD REPLY
3
Entering edit mode
14.1 years ago

Using the UCSC mysql server, and using Neil's examples:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e '
select T.acc,T.name,T.term_type, X.geneSymbol
from go.term as T,
go.goaPart as GOA,
hg18.kgXref as X
where
T.acc=GOA.goId and
GOA.dbObjectSymbol=X.spDisplayId and 
X.geneSymbol in ("EPB41L3", "RAB31", "TUBB6", "ADAMTS1", "CFD", "CLDN8")
'

+------------+---------------------------------------------------------+--------------------+------------+
| acc        | name                                                    | term_type          | geneSymbol |
+------------+---------------------------------------------------------+--------------------+------------+
| GO:0004222 | metalloendopeptidase activity                           | molecular_function | ADAMTS1    | 
| GO:0005178 | integrin binding                                        | molecular_function | ADAMTS1    | 
| GO:0005576 | extracellular region                                    | cellular_component | ADAMTS1    | 
| GO:0005578 | proteinaceous extracellular matrix                      | cellular_component | ADAMTS1    | 
| GO:0006508 | proteolysis                                             | biological_process | ADAMTS1    | 
| GO:0007229 | integrin-mediated signaling pathway                     | biological_process | ADAMTS1    | 
| GO:0008201 | heparin binding                                         | molecular_function | ADAMTS1    | 
| GO:0008233 | peptidase activity                                      | molecular_function | ADAMTS1    | 
| GO:0008237 | metallopeptidase activity                               | molecular_function | ADAMTS1    | 
| GO:0008270 | zinc ion binding                                        | molecular_function | ADAMTS1    | 
| GO:0008285 | negative regulation of cell proliferation               | biological_process | ADAMTS1    | 
| GO:0016787 | hydrolase activity                                      | molecular_function | ADAMTS1    | 
| GO:0031012 | extracellular matrix                                    | cellular_component | ADAMTS1    | 
| GO:0046872 | metal ion binding                                       | molecular_function | ADAMTS1    | 
| GO:0003817 | complement factor D activity                            | molecular_function | CFD        | 
| GO:0003824 | catalytic activity                                      | molecular_function | CFD        | 
| GO:0004252 | serine-type endopeptidase activity                      | molecular_function | CFD        | 
| GO:0005576 | extracellular region                                    | cellular_component | CFD        | 
| GO:0006508 | proteolysis                                             | biological_process | CFD        | 
| GO:0006955 | immune response                                         | biological_process | CFD        | 
| GO:0006956 | complement activation                                   | biological_process | CFD        | 
| GO:0006957 | complement activation, alternative pathway              | biological_process | CFD        |
ADD COMMENT
0
Entering edit mode

How to access the UCSC SQL server?

ADD REPLY
0
Entering edit mode
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18
ADD REPLY
0
Entering edit mode

Thank you, I will try it :)

ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6