You can use the mygene.R package available via Bioconductor: http://www.bioconductor.org/packages/release/bioc/html/mygene.html
To install:
source("http://bioconductor.org/biocLite.R")
biocLite("mygene")`
Load: library(mygene)
1. Create at list of your gene symbols or entrez gene ids or whatever (various inputs are acceptable as long as they're properly scoped):
> xli <- c('BRCA1',
'BRCA2',
'SOX2',
'MYC')
2. Run the search for the items in your list (in this case, scoping to gene symbols, returning entrezgene id's and gene ontology and restricting to human genes) and display your search results:
> res <- queryMany(xli, scopes='symbol', fields=c('entrezgene', 'go'), species='human')
> res
Results:
DataFrame with 4 rows and 6 columns
go.CC go.MF go.BP query entrezgene _id
<List> <List> <List> <character> <integer> <character>
1 ######## ######## ######## BRCA1 672 672
2 ######## ######## ######## BRCA2 675 675
3 ######## ######## ######## SOX2 6657 6657
4 ######## ######## ######## MYC 4609 4609
3. Display records of interest (in this case the cellular gene ontology terms for the 1st record, but you can also get the biological process go's and molecular function go's):
> res[1, 'go.CC'][[1]]
Results (again, just cellular component go's, change to 'CC' to 'BP' or 'MF' for other types of go's:
term pubmed id evidence
1 ubiquitin ligase complex 14976165 GO:0000151 NAS
2 nucleus 17525340 GO:0005634 IDA
3 nucleoplasm NA GO:0005654 TAS
4 chromosome NA GO:0005694 ISS
5 cytoplasm NA GO:0005737 IDA
6 plasma membrane NA GO:0005886 IDA
7 gamma-tubulin ring complex 12214252 GO:0008274 NAS
8 ribonucleoprotein complex 18809582 GO:0030529 IDA
9 BRCA1-BARD1 complex 12890688 GO:0031436 IDA
10 protein complex 9774970 GO:0043234 IDA
11 BRCA1-A complex 17525340 GO:0070531 IDA
@gtsueng... I have a similar question and i have already got the GO ids for my genes but now how do i extract some information from the GO ids for each gene? for example i want to extract JAK-STAT cascade or cellular protein metabolic process, how do i extract that?
How do I know which gene represents which CC Term? and how and I link the gene name which the CC in a new list?