Is there an R package that pulls up gene functional annotations with gene symbols as input?
3
7
Entering edit mode
10.0 years ago
karthik ▴ 90

I have a list of genes, each of which I would like to independently annotate with a function and/or pathway using "keywords" associated with that gene.

Is there an R package that returns functional keywords when the gene symbol (e.g. BRCA1 or IL2RA) is used as a query?

I am not looking for functional enrichments of the set of genes as a whole, but keywords for each gene independent of others.

This seems like a very simple thing that would be a commonplace task. But I don't see any packages in R that allow me to do that. Any help would be appreciated.

Karthik

gene RNA-Seq annotation • 18k views
ADD COMMENT
10
Entering edit mode
10.0 years ago
gtsueng ▴ 190

You can use the mygene.R package available via Bioconductor: http://www.bioconductor.org/packages/release/bioc/html/mygene.html

To install:

source("http://bioconductor.org/biocLite.R")
biocLite("mygene")`

Load: library(mygene)

1. Create at list of your gene symbols or entrez gene ids or whatever (various inputs are acceptable as long as they're properly scoped):

> xli <- c('BRCA1', 
       'BRCA2', 
       'SOX2', 
       'MYC')

2. Run the search for the items in your list (in this case, scoping to gene symbols, returning entrezgene id's and gene ontology and restricting to human genes) and display your search results:

> res <- queryMany(xli, scopes='symbol', fields=c('entrezgene', 'go'), species='human')
> res

Results:

DataFrame with 4 rows and 6 columns
     go.CC    go.MF    go.BP       query entrezgene         _id
    <List>   <List>   <List> <character>  <integer> <character>
1 ######## ######## ########       BRCA1        672         672
2 ######## ######## ########       BRCA2        675         675
3 ######## ######## ########        SOX2       6657        6657
4 ######## ######## ########         MYC       4609        4609

3. Display records of interest (in this case the cellular gene ontology terms for the 1st record, but you can also get the biological process go's and molecular function go's):

> res[1, 'go.CC'][[1]]

Results (again, just cellular component go's, change to 'CC' to 'BP' or 'MF' for other types of go's:

                         term   pubmed         id evidence
1    ubiquitin ligase complex 14976165 GO:0000151      NAS
2                     nucleus 17525340 GO:0005634      IDA
3                 nucleoplasm       NA GO:0005654      TAS
4                  chromosome       NA GO:0005694      ISS
5                   cytoplasm       NA GO:0005737      IDA
6             plasma membrane       NA GO:0005886      IDA
7  gamma-tubulin ring complex 12214252 GO:0008274      NAS
8   ribonucleoprotein complex 18809582 GO:0030529      IDA
9         BRCA1-BARD1 complex 12890688 GO:0031436      IDA
10            protein complex  9774970 GO:0043234      IDA
11            BRCA1-A complex 17525340 GO:0070531      IDA
ADD COMMENT
0
Entering edit mode

@gtsueng... I have a similar question and i have already got the GO ids for my genes but now how do i extract some information from the GO ids for each gene? for example i want to extract JAK-STAT cascade or cellular protein metabolic process, how do i extract that?

ADD REPLY
0
Entering edit mode

How do I know which gene represents which CC Term? and how and I link the gene name which the CC in a new list?

ADD REPLY
0
Entering edit mode
10.0 years ago
EagleEye 7.6k

You can also use this, if you are using human genes and running on Linux: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT
0
Entering edit mode
5.0 years ago

Even though I tend to use the web-interface, I believe you can accomplish what you want with the R-package for Enrichr:

https://cran.r-project.org/web/packages/enrichR/index.html

The only caveat is that you'll need to know which gene sets you want to test ahead of time (instead of browsing through them interactively)

ADD COMMENT

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6