Retrieving all genes associated with a GO term
1
0
Entering edit mode
3.9 years ago

Hi,

I'm looking for an easy way to retrieve all the genes in a list that are associated with a certain GO term, preferably using R/Bioconductor packages. I'm not interested in under/overrepresentation or enrichment.

For instance, I want a list of all genes known to be located in 'presynaptic endosome' (GO:009883).

I tried the method referred to in an older post (https://www.biostars.org/p/52101/); following is my code:

ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") #retrieve human.ensembl data via biomaRt

go_1 = c("GO:0098830", "GO:0098954", "GO:0099007", "GO:0099067", "GO:0099037", "GO:0098955", "GO:0099592", "GO:0099532") #GO IDs of all terms associated with "presynaptic endosome" (inclusive of child terms) 

pre.gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'go_id', 'go_linkage_type'),
                    filters = 'go', values = list(go_1), mart = ensembl)

but my code doesn't filter the genes based on the GO terms. It gives me the following output:

GO:0006886 AP3D1 ENSG00000065000 intracellular protein transport IEA
GO:0016192 AP3D1 ENSG00000065000 vesicle-mediated transport IEA
GO:0030117 AP3D1 ENSG00000065000 membrane coat IEA
... and 49 other entries for the gene "AP3D1"

I'm unable to understand what might be the problem. I replaced the filter "go_id" to "go" because of the updated filters used:listFilters(ensembl).

Please help!

R bioconductor gene-ontology • 3.2k views
ADD COMMENT
0
Entering edit mode

I think "filters" should match with attributes and headers of the data. So, if you're using "go" as filters, the header and attribute should also be "go" instead of "go_id", or you can use filters="go_id" to match.

ADD REPLY
0
Entering edit mode

Hi, Thanks for your suggestion. "go" is not a valid attribute name, and "go_id" is not a valid filter name. It shows me an error message: Invalid attribute(s): go Please use the function 'listAttributes' to get valid attribute names

ADD REPLY
0
Entering edit mode

Have you tried changing the headers of the files (go_id to go), and then get the list of attributes?

The link to the post in your original question doesn't work. Are you referring to this post?

annotation - biomaRt - getBM - multiple entrez ID

Another good resource:

https://www.stat.berkeley.edu/~sandrine/Teaching/PH292.S10/Durinck.pdf

ADD REPLY
0
Entering edit mode
3.9 years ago

A maybe simpler way of doing it:

library(org.Hs.eg.db)
library(GO.db)
results <- AnnotationDbi::select(org.Hs.eg.db, keys=c("GO:0098830"), columns = c('SYMBOL'), keytype = "GOALL")
gene_symbols <- unique(results$SYMBOL)

Notice the "GOALL" key type. This allows retrieval of genes annotated with child terms. You can get more than one line for the same gene in results because of multiple evidence. So to get a non-redundant list of genes, apply unique to the SYMBOL column.

ADD COMMENT
0
Entering edit mode

Thank you! This works, it seems like there aren't any other genes annotated to GO:0098830.

ADD REPLY

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6