GOATOOLS to perform Gene ontology enrichment analyses in Python in a non model organism.
0
0
Entering edit mode
17 months ago
Sbrillo ▴ 10

Hi!

I'm trying to use GOATOOLS to perform Gene Ontology Enrichment analyses on a list of gene from a non-model organism (Anopheles gambiae, main vector of human malaria).

I managed to run the tool, but I'm having issue with the statistical analyses since I'm not getting any output. I'm using a gaf file to build my reference dataset.

This is my script, it might have a lot of missing gap.

import numpy as np
from goatools.base import download_go_basic_obo
from goatools.base import download_ncbi_associations
from goatools.obo_parser import GODag
from goatools.associations import read_gaf
from goatools.anno.genetogo_reader import Gene2GoReader
from goatools.goea.go_enrichment_ns import GOEnrichmentStudy


#obo_fname = download_go_basic_obo()
obodag = GODag('GO_annotation/go-basic.obo')

gaf_file = ('GO_annotation/VectorBase-CURRENT_AgambiaePEST_GO.gaf')  # Path to your GAF file

geneid2gos = read_gaf(gaf_file, namespace='BP', godag=obodag)

geneid2gos

goeaobj = GOEnrichmentStudy(
    AGAP_YSHR_ZGA, # my list of genes
    geneid2gos, # List of anopheles gene
    obodag, #Ontologies
    propagate_counts = False,
    alpha = 0.05,
    methods= ['fdr_tsbh'])

#goeaobj
goea_results_all = goeaobj.run_study(AGAP_YSHR_ZGA)

goea_results_all

This is the output that I'm getting:

GO_annotation/go-basic.obo: fmt(1.2) rel(2023-05-10) 46,490 Terms
HMS:0:00:03.154457 120,533 annotations READ: GO_annotation/VectorBase-CURRENT_AgambiaePEST_GO.gaf 
7755 IDs in loaded association branch, BP

Load  Ontology Enrichment Analysis ...
 64%  5,229 of  8,162 population items found in association

Runing  Ontology Analysis: current study set of 8162 IDs.
 64%  5,229 of  8,162 study items found in association
100%  8,162 of  8,162 study items found in population(8162)
Calculating 2,439 uncorrected p-values using fisher_scipy_stats
   2,439 terms are associated with  5,229 of  8,162 population items
   2,439 terms are associated with  5,229 of  8,162 study items
  METHOD fdr_tsbh:
       0 GO terms found significant (< 0.05=alpha) (  0 enriched +   0 purified): statsmodels fdr_tsbh
       0 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)

It's seems that is able to find my list fo gene in the dataset, but is not performing any statistical test. I know that some of the genes are enriched since I used PANTHER that gave me the right enrichment.

Can someone help me?

Thanks in advance!

Python RNA-seq Trascriptomics Gene-Ontology • 1.1k views
ADD COMMENT
0
Entering edit mode

I don't know how goatools works, just looking at your error message, it seems like your population and study are the same data.

I find it weird that the population and the study messages report the same number of terms and items.

ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6