Hi!
I'm trying to use GOATOOLS to perform Gene Ontology Enrichment analyses on a list of gene from a non-model organism (Anopheles gambiae, main vector of human malaria).
I managed to run the tool, but I'm having issue with the statistical analyses since I'm not getting any output. I'm using a gaf file to build my reference dataset.
This is my script, it might have a lot of missing gap.
import numpy as np
from goatools.base import download_go_basic_obo
from goatools.base import download_ncbi_associations
from goatools.obo_parser import GODag
from goatools.associations import read_gaf
from goatools.anno.genetogo_reader import Gene2GoReader
from goatools.goea.go_enrichment_ns import GOEnrichmentStudy
#obo_fname = download_go_basic_obo()
obodag = GODag('GO_annotation/go-basic.obo')
gaf_file = ('GO_annotation/VectorBase-CURRENT_AgambiaePEST_GO.gaf') # Path to your GAF file
geneid2gos = read_gaf(gaf_file, namespace='BP', godag=obodag)
geneid2gos
goeaobj = GOEnrichmentStudy(
AGAP_YSHR_ZGA, # my list of genes
geneid2gos, # List of anopheles gene
obodag, #Ontologies
propagate_counts = False,
alpha = 0.05,
methods= ['fdr_tsbh'])
#goeaobj
goea_results_all = goeaobj.run_study(AGAP_YSHR_ZGA)
goea_results_all
This is the output that I'm getting:
GO_annotation/go-basic.obo: fmt(1.2) rel(2023-05-10) 46,490 Terms
HMS:0:00:03.154457 120,533 annotations READ: GO_annotation/VectorBase-CURRENT_AgambiaePEST_GO.gaf
7755 IDs in loaded association branch, BP
Load Ontology Enrichment Analysis ...
64% 5,229 of 8,162 population items found in association
Runing Ontology Analysis: current study set of 8162 IDs.
64% 5,229 of 8,162 study items found in association
100% 8,162 of 8,162 study items found in population(8162)
Calculating 2,439 uncorrected p-values using fisher_scipy_stats
2,439 terms are associated with 5,229 of 8,162 population items
2,439 terms are associated with 5,229 of 8,162 study items
METHOD fdr_tsbh:
0 GO terms found significant (< 0.05=alpha) ( 0 enriched + 0 purified): statsmodels fdr_tsbh
0 study items associated with significant GO IDs (enriched)
0 study items associated with significant GO IDs (purified)
It's seems that is able to find my list fo gene in the dataset, but is not performing any statistical test. I know that some of the genes are enriched since I used PANTHER that gave me the right enrichment.
Can someone help me?
Thanks in advance!
I don't know how goatools works, just looking at your error message, it seems like your population and study are the same data.
I find it weird that the population and the study messages report the same number of terms and items.