This should help you on your way, my friend:
The two best ways to convert between gene annotations is via biomaRt or the org.Db packages. Here, I provide an example via org.Hs.eg.db:
gene_list <- c('AAK1', 'ABCB9', 'ABHD12', 'ACER2', 'ACSL4', 'ACTR1A',
'ADAM19', 'ADAM22', 'ADARB1', 'ADIPOR2', 'ADORA2A', 'ADPRH')
require(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
select(org.Hs.eg.db,
keys = gene_list,
columns = c('ENTREZID','SYMBOL','GENENAME'),
keytype = 'SYMBOL')
'select()' returned 1:1 mapping between keys and columns
SYMBOL ENTREZID GENENAME
1 AAK1 22848 AP2 associated kinase 1
2 ABCB9 23457 ATP binding cassette subfamily B member 9
3 ABHD12 26090 abhydrolase domain containing 12
4 ACER2 340485 alkaline ceramidase 2
5 ACSL4 2182 acyl-CoA synthetase long chain family member 4
6 ACTR1A 10121 actin related protein 1A
7 ADAM19 8728 ADAM metallopeptidase domain 19
...
You can then take the Entrez IDs and use them for topGO and enrichGO. topGO should accept gene symbols too, though. For topGO and enrichGO, you will require some other information to rank your genes, e.g., fold change or p-value. If ranking of your genes is not important, then you can generate rapid enrichment results via enrichr:
library(enrichR)
listEnrichrDbs()
dbs <- c("Allen_Brain_Atlas_down", "Allen_Brain_Atlas_up",
"BioCarta_2016",
"Cancer_Cell_Line_Encyclopedia",
"Genes_Associated_with_NIH_Grants",
"GeneSigDB",
"GO_Biological_Process_2018", "GO_Molecular_Function_2018",
"GWAS_Catalog_2019", "HMDB_Metabolites",
"KEGG_2019_Human",
"UK_Biobank_GWAS_v1", "WikiPathways_2019_Human")
That's only a small fraction of the number of databases with which enrichr is programmed to interact.
enriched <- enrichr(gene_list, dbs)
That basically stores the results in a list. We can summarise them into a single table and filter by nominal and adjusted p-values as follows:
head(do.call(rbind,
lapply(enriched,
function(x) subset(x, P.value <= 0.05))))
Term
Allen_Brain_Atlas_down.1 Cortical amygdalar area, anterior part, layer 1
Allen_Brain_Atlas_down.2 Cortical amygdalar area, posterior part, medial zone, layer 2
Allen_Brain_Atlas_down.3 Medial amygdalar nucleus, anteroventral part
Overlap P.value Adjusted.P.value Old.P.value
Allen_Brain_Atlas_down.1 3/300 0.0006648712 1.0000000 0
Allen_Brain_Atlas_down.2 3/300 0.0006648712 0.7286989 0
Allen_Brain_Atlas_down.3 3/300 0.0006648712 0.4857993 0
Old.Adjusted.P.value Odds.Ratio Combined.Score
Allen_Brain_Atlas_down.1 0 16.66667 121.93195
Allen_Brain_Atlas_down.2 0 16.66667 121.93195
Allen_Brain_Atlas_down.3 0 16.66667 121.93195
Genes
Allen_Brain_Atlas_down.1 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.2 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.3 ACTR1A;ADPRH;ADIPOR2
do.call(rbind,
lapply(enriched,
function(x) subset(x, Adjusted.P.value <= 0.2)))
Term Overlap
GeneSigDB.2 12734205-TableS5 3/167
Kinase_Perturbations_from_GEO_down SYK druginhibition 152 GSE34176 3/300
P.value Adjusted.P.value Old.P.value
GeneSigDB.2 0.0001190170 0.1272887 0
Kinase_Perturbations_from_GEO_down 0.0006648712 0.1894883 0
Old.Adjusted.P.value Odds.Ratio
GeneSigDB.2 0 29.94012
Kinase_Perturbations_from_GEO_down 0 16.66667
Combined.Score Genes
GeneSigDB.2 270.5462 ADAM19;ACSL4;ADARB1
Kinase_Perturbations_from_GEO_down 121.9320 ADAM19;ADORA2A;ACSL4
We are only looking at a handful of genes here, so, obviously, nothing passes FDR adjustment.
Kevin
Thank you, Mr Blighe for your time and your excelent reproducible code. It works perfectly! Thanks to you now i can expand my analysis to my complete list.
@Kevin Blighe ¿Is there a way i could obtain the Ontology for every gene on the list instead of evaluating the complete set to see the enrichment? Cause i dont see a GO term anywhere with the reproducible example. When i run it i dont get the "Kinase_perturbations" for example.
Alright, i just got to select the GO and GOALL in the keytypes. Nice.
Yes, you just need GO and GOALL. However, that greatly increases the amount of output, as you can see. Good luck! / ¡Buena suerte!