How to obtain GO of a gene symbol list in R?
1
0
Entering edit mode
4.6 years ago

Its been a while since i work with gene-list and the R software, and sadly i lost a lost of info of my courses due to a corrupted hard-drive ;C.

I have managed to obtain a list of the genes (human genes) i want to test for their ontology. ¿Could you explain to me how i could manage to do this with R? I've been trying to use topGO and enrichGO, but i cant even manage to "translate" my gene symbols into ENTREZID keys.

¿How could i obtain the GO terms from my list (vector) of just the Approved Symbols in R?

As an example: gene_list = ("AAK1","ABCB9","ABHD12","ACER2","ACSL4","ACTR1A","ADAM19","ADAM22","ADARB1","ADIPOR2","ADORA2A","ADPRH")

R RNA-Seq • 2.9k views
ADD COMMENT
4
Entering edit mode
4.6 years ago

This should help you on your way, my friend:

The two best ways to convert between gene annotations is via biomaRt or the org.Db packages. Here, I provide an example via org.Hs.eg.db:

gene_list <- c('AAK1', 'ABCB9', 'ABHD12', 'ACER2', 'ACSL4', 'ACTR1A',
  'ADAM19', 'ADAM22', 'ADARB1', 'ADIPOR2', 'ADORA2A', 'ADPRH')

require(org.Hs.eg.db)
keytypes(org.Hs.eg.db)

select(org.Hs.eg.db,
  keys = gene_list,
  columns = c('ENTREZID','SYMBOL','GENENAME'),
  keytype = 'SYMBOL')

'select()' returned 1:1 mapping between keys and columns
    SYMBOL ENTREZID                                       GENENAME
1     AAK1    22848                        AP2 associated kinase 1
2    ABCB9    23457      ATP binding cassette subfamily B member 9
3   ABHD12    26090               abhydrolase domain containing 12
4    ACER2   340485                          alkaline ceramidase 2
5    ACSL4     2182 acyl-CoA synthetase long chain family member 4
6   ACTR1A    10121                       actin related protein 1A
7   ADAM19     8728                ADAM metallopeptidase domain 19
...

You can then take the Entrez IDs and use them for topGO and enrichGO. topGO should accept gene symbols too, though. For topGO and enrichGO, you will require some other information to rank your genes, e.g., fold change or p-value. If ranking of your genes is not important, then you can generate rapid enrichment results via enrichr:

library(enrichR)
listEnrichrDbs()
dbs <- c("Allen_Brain_Atlas_down", "Allen_Brain_Atlas_up",
  "BioCarta_2016",
  "Cancer_Cell_Line_Encyclopedia",
  "Genes_Associated_with_NIH_Grants",
  "GeneSigDB",
  "GO_Biological_Process_2018", "GO_Molecular_Function_2018",
  "GWAS_Catalog_2019", "HMDB_Metabolites",
  "KEGG_2019_Human",
  "UK_Biobank_GWAS_v1", "WikiPathways_2019_Human")

That's only a small fraction of the number of databases with which enrichr is programmed to interact.

enriched <- enrichr(gene_list, dbs)

That basically stores the results in a list. We can summarise them into a single table and filter by nominal and adjusted p-values as follows:

head(do.call(rbind,
  lapply(enriched,
    function(x) subset(x, P.value <= 0.05))))

                                                                                  Term
Allen_Brain_Atlas_down.1               Cortical amygdalar area, anterior part, layer 1
Allen_Brain_Atlas_down.2 Cortical amygdalar area, posterior part, medial zone, layer 2
Allen_Brain_Atlas_down.3                  Medial amygdalar nucleus, anteroventral part
                         Overlap      P.value Adjusted.P.value Old.P.value
Allen_Brain_Atlas_down.1   3/300 0.0006648712        1.0000000           0
Allen_Brain_Atlas_down.2   3/300 0.0006648712        0.7286989           0
Allen_Brain_Atlas_down.3   3/300 0.0006648712        0.4857993           0
                         Old.Adjusted.P.value Odds.Ratio Combined.Score
Allen_Brain_Atlas_down.1                    0   16.66667      121.93195
Allen_Brain_Atlas_down.2                    0   16.66667      121.93195
Allen_Brain_Atlas_down.3                    0   16.66667      121.93195
                                        Genes
Allen_Brain_Atlas_down.1 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.2 ACTR1A;ADPRH;ADIPOR2
Allen_Brain_Atlas_down.3 ACTR1A;ADPRH;ADIPOR2



do.call(rbind,
  lapply(enriched,
    function(x) subset(x, Adjusted.P.value <= 0.2)))

                                                              Term Overlap
GeneSigDB.2                                       12734205-TableS5   3/167
Kinase_Perturbations_from_GEO_down SYK druginhibition 152 GSE34176   3/300
                                        P.value Adjusted.P.value Old.P.value
GeneSigDB.2                        0.0001190170        0.1272887           0
Kinase_Perturbations_from_GEO_down 0.0006648712        0.1894883           0
                                   Old.Adjusted.P.value Odds.Ratio
GeneSigDB.2                                           0   29.94012
Kinase_Perturbations_from_GEO_down                    0   16.66667
                                   Combined.Score                Genes
GeneSigDB.2                              270.5462  ADAM19;ACSL4;ADARB1
Kinase_Perturbations_from_GEO_down       121.9320 ADAM19;ADORA2A;ACSL4

We are only looking at a handful of genes here, so, obviously, nothing passes FDR adjustment.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you, Mr Blighe for your time and your excelent reproducible code. It works perfectly! Thanks to you now i can expand my analysis to my complete list.

ADD REPLY
0
Entering edit mode

@Kevin Blighe ¿Is there a way i could obtain the Ontology for every gene on the list instead of evaluating the complete set to see the enrichment? Cause i dont see a GO term anywhere with the reproducible example. When i run it i dont get the "Kinase_perturbations" for example.

Alright, i just got to select the GO and GOALL in the keytypes. Nice.

ADD REPLY
0
Entering edit mode

Yes, you just need GO and GOALL. However, that greatly increases the amount of output, as you can see. Good luck! / ¡Buena suerte!

ADD REPLY

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6