Is there a way to do GO terms enrichment analysis and keep track of gene symbols?
3
1
Entering edit mode
3.4 years ago
DN99 ▴ 20

I've got 2 lists of HGNC gene symbols (1 target gene list, 1 list of background genes) that I run in GOfuncR package using the go_enrich() function to test gene sets for enrichment in GO-categories.

The output I get for this is the GO term IDs and their p-values. I'm wondering if there's a way or another tool from which I could also get a 3rd third column listing the genes that relate to each specific GO term ID from my input gene list? Currently I can't figure out how to get this with the go_enrich() function I'm using.

Separately, I've tried with biomart() taking my list of enriched GO IDs and finding their gene symbols, which works but doesn't actually find all my input genes. I will have GO IDs that are significant but none of their corresponding genes were in my input list according to the biomart conversion.

genetics genes GO enrichment • 1.8k views
ADD COMMENT
0
Entering edit mode
3.4 years ago
Pratik ★ 1.1k

Hi DN99,

I think this may be what you need.

First download this file. It is the most current gene ontology database file: http://current.geneontology.org/annotations/goa_human.gaf.gz

Then extract the gzip file onto your Desktop. The following script will give you a master list of all GENE IDS and their associated GO IDs in a dataframe. Then you can use the merge() function to merge the GO:IDS with your gene list data frame.

system("awk 'NR>=42' ~/Desktop/goa_human.gaf > ~/Desktop/goa_human_no_header.txt")
GO <-read.csv("~/Desktop/goa_human_no_header.txt", header=F, sep="\t")

GO$V4 <- NULL
GO$V7 <- NULL
GO$V8 <- NULL
GO$V1 <- NULL
GO$V6 <- NULL
GO$V10 <- NULL
GO$V13 <- NULL
GO$V14 <- NULL
GO$V16 <- NULL
GO$V17 <- NULL
GO$V12 <- NULL
GO$V15 <- NULL
GO$V2 <- NULL
GO$V9 <- NULL
GO$V11 <- NULL
colnames(GO) <- c("GENEID", "GOID")

If you want more information such as GO TERMS in a dataframe as well, you can use the follow script:

system("awk 'NR>=42' ~/Desktop/goa_human.gaf > ~/Desktop/goa_human_no_header.txt")
GO <-read.csv("~/Desktop/goa_human_no_header.txt", header=F, sep="\t")

BiocManager::install("GO.db")
library(GO.db)
GOdb <- as.data.frame(GOTERM)
GO$V4 <- NULL
GO$V7 <- NULL
GO$V8 <- NULL
GO$V1 <- NULL
GO$V6 <- NULL
GO$V10 <- NULL
GO$V13 <- NULL
GO$V14 <- NULL
GO$V16 <- NULL
GO$V17 <- NULL
GO$V12 <- NULL
GO$V15 <- NULL
GO$V2 <- NULL
GO$V9 <- NULL
GO$V11 <- NULL
colnames(GO) <- c("GENEID", "GOID")
colnames(GOdb)[1] <- c("GOID")
GOdb <- head(GOdb,-1)
GENESwithGO <- merge(GO, GOdb, by = "GOID")
rm(GOdb, GO)
GENESwithGO$go_id <- NULL

This should create a master data frame for you of GO IDs and their gene ontology terms. It might be overkill for your purposes, but just throwing this here.

Hope this helps!

ADD COMMENT
0
Entering edit mode
3.4 years ago
Zhilong Jia ★ 2.2k

GO analysis using clusterProfiler

head(summary(ego2), n=3)

##                    ID          Description GeneRatio   BgRatio
## GO:0005819 GO:0005819              spindle    24/197 222/11632
## GO:0005876 GO:0005876  spindle microtubule    11/197  45/11632
## GO:0000793 GO:0000793 condensed chromosome    17/197 150/11632
##                  pvalue     p.adjust       qvalue
## GO:0005819 3.810608e-13 1.276554e-10 1.139171e-10
## GO:0005876 1.527089e-10 2.557874e-08 2.282596e-08
## GO:0000793 5.838332e-10 6.519471e-08 5.817847e-08
##                                                                                                                                                   geneID
## GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/NEK2/CDK1/MAD2L1/KIF18A/BIRC5/KIF11/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
## GO:0005876                                                                             SKA1/NUSAP1/CDK1/KIF18A/BIRC5/KIF11/AURKB/PRC1/KIF18B/AURKA/KIF4A
## GO:0000793                                         CENPE/NDC80/TOP2A/NCAPH/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/BIRC5/NCAPG/AURKB/CHEK1/AURKA/CCNB1
##            Count
## GO:0005819    24
## GO:0005876    11
## GO:0000793    17
ADD COMMENT
0
Entering edit mode
2.9 years ago
Steffi • 0

This is already an old post, but actually there is a function in GOfuncR to get the genes associated with GO-terms: get_anno_genes

ADD COMMENT

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6