Question

How to obtain GO of a gene symbol list in R?

0

Entering edit mode

4.9 years ago

galapacheco • 0

Its been a while since i work with gene-list and the R software, and sadly i lost a lost of info of my courses due to a corrupted hard-drive ;C.

I have managed to obtain a list of the genes (human genes) i want to test for their ontology. ¿Could you explain to me how i could manage to do this with R? I've been trying to use topGO and enrichGO, but i cant even manage to "translate" my gene symbols into ENTREZID keys.

¿How could i obtain the GO terms from my list (vector) of just the Approved Symbols in R?

As an example: gene_list = ("AAK1","ABCB9","ABHD12","ACER2","ACSL4","ACTR1A","ADAM19","ADAM22","ADARB1","ADIPOR2","ADORA2A","ADPRH")

R RNA-Seq • 3.2k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe 89k • written 4.9 years ago by galapacheco • 0

score 4 · Accepted Answer · 2020-05-07

This should help you on your way, my friend:

The two best ways to convert between gene annotations is via biomaRt or the org.Db packages. Here, I provide an example via org.Hs.eg.db:

gene_list <- c('AAK1', 'ABCB9', 'ABHD12', 'ACER2', 'ACSL4', 'ACTR1A',
  'ADAM19', 'ADAM22', 'ADARB1', 'ADIPOR2', 'ADORA2A', 'ADPRH')

require(org.Hs.eg.db)
keytypes(org.Hs.eg.db)

select(org.Hs.eg.db,
  keys = gene_list,
  columns = c('ENTREZID','SYMBOL','GENENAME'),
  keytype = 'SYMBOL')

'select()' returned 1:1 mapping between keys and columns
    SYMBOL ENTREZID                                       GENENAME
1     AAK1    22848                        AP2 associated kinase 1
2    ABCB9    23457      ATP binding cassette subfamily B member 9
3   ABHD12    26090               abhydrolase domain containing 12
4    ACER2   340485                          alkaline ceramidase 2
5    ACSL4     2182 acyl-CoA synthetase long chain family member 4
6   ACTR1A    10121                       actin related protein 1A
7   ADAM19     8728                ADAM metallopeptidase domain 19
...

You can then take the Entrez IDs and use them for topGO and enrichGO. topGO should accept gene symbols too, though. For topGO and enrichGO, you will require some other information to rank your genes, e.g., fold change or p-value. If ranking of your genes is not important, then you can generate rapid enrichment results via enrichr:

library(enrichR)
listEnrichrDbs()
dbs <- c("Allen_Brain_Atlas_down", "Allen_Brain_Atlas_up",
  "BioCarta_2016",
  "Cancer_Cell_Line_Encyclopedia",
  "Genes_Associated_with_NIH_Grants",
  "GeneSigDB",
  "GO_Biological_Process_2018", "GO_Molecular_Function_2018",
  "GWAS_Catalog_2019", "HMDB_Metabolites",
  "KEGG_2019_Human",
  "UK_Biobank_GWAS_v1", "WikiPathways_2019_Human")

That's only a small fraction of the number of databases with which enrichr is programmed to interact.

enriched <- enrichr(gene_list, dbs)

That basically stores the results in a list. We can summarise them into a single table and filter by nominal and adjusted p-values as follows:

head(do.call(rbind,
  lapply(enriched,
    function(x) subset(x, P.value <= 0.05))))

                                                                                  Term
Allen_Brain_Atlas_down.1               Cortical amygdalar area, anterior part, layer 1
Allen_Brain_Atlas_down.2 Cortical amygdalar area, posterior part, medial zone, layer 2
Allen_Brain_Atlas_down.3                  Medial amygdalar nucleus, anteroventral part
                         Overlap      P.value Adjusted.P.value Old.P.value
Allen_Brain_Atlas_down.1   3/300 0.0006648712        1.0000000           0
Allen_Brain_Atlas_down.2   3/300 0.0006648712        0.7286989           0
Allen_Brain_Atlas_down.3   3/300 0.0006648712        0.4857993           0
                         Old.Adjusted.P.value Odds.Ratio Combined.Score
Allen_Brain_Atlas_down.1                    0   16.66667      121.93195
Allen_Brain_Atlas_down.2                    0   16.66667      121.93195
Allen_Brain_Atlas_down.3                    0   16.66667      121.93195
                                        Genes
Allen_Brain_Atlas_down.1 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.2 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.3 ACTR1A;ADPRH;ADIPOR2



do.call(rbind,
  lapply(enriched,
    function(x) subset(x, Adjusted.P.value <= 0.2)))

                                                              Term Overlap
GeneSigDB.2                                       12734205-TableS5   3/167
Kinase_Perturbations_from_GEO_down SYK druginhibition 152 GSE34176   3/300
                                        P.value Adjusted.P.value Old.P.value
GeneSigDB.2                        0.0001190170        0.1272887           0
Kinase_Perturbations_from_GEO_down 0.0006648712        0.1894883           0
                                   Old.Adjusted.P.value Odds.Ratio
GeneSigDB.2                                           0   29.94012
Kinase_Perturbations_from_GEO_down                    0   16.66667
                                   Combined.Score                Genes
GeneSigDB.2                              270.5462  ADAM19;ACSL4;ADARB1
Kinase_Perturbations_from_GEO_down       121.9320 ADAM19;ADORA2A;ACSL4

We are only looking at a handful of genes here, so, obviously, nothing passes FDR adjustment.

Kevin