(Sorry if this was bumped; needed to rename go_db to godb since the underscore created downstream problems for python's setuptools.)
godb
godb is a Gene Ontology library for Python that contains a set of annotation maps describing most of the Gene Ontology.
It downloads, parses and exposes the Gene Ontology data in dataframes.
Note that the github version might not be stable; download using pip install godb
.
Usage
Get annotations
You'll get the annotation table with godb.get_annotations()
import godb
anno = godb.get_annotations()
anno.head(3)
GO id Ontology Term Synonym Definition
GO:0000001 BP mitochondrion inheritance mitochondrial inheritance The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
GO:0000002 BP mitochondrial genome maintenance The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
GO:0000003 BP reproduction reproductive physiological process The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
len(anno)
# 41688
If there are multiple synonyms, these are separated with ;
. While this makes the data untidy, it avoids having to include an arbitrary number of columns, many of which would be empty (for most rows).
Get maps of parents and children
With the functions get_children
and get_offspring
you get a two column map showing the parents of each child and all the ancestors of each child, respectively.
cc_children = godb.get_children("CC")
cc_children.head(3)
# Child Parent Relation
# 0 GO:0000015 GO:0044445 is_a
# 1 GO:0000015 GO:1902494 is_a
# 2 GO:0000109 GO:0044428 is_a
len(cc_children)
# 5511
cc_offspring = godb.get_offspring("CC")
cc_offspring.head(3)
# Offspring Parent
# 0 GO:0000015 GO:0044445
# 0 GO:0000110 GO:0044428
# 1 GO:0000111 GO:0044428
len(cc_offspring)
# 30658
Both get_offspring
and get_children
take the argument relations
, which is ["is_a", "part_of", "has_part"]
by default. If you want to ignore certain relations when computing children or offspring, change this argument. R's GO.db uses the relationships ["is_a", "part_of"]
to compute ancestors, so use these to get identical behavior.
get_offspring("CC", ["is_a", "part_of"]).head(3)
# Offspring Parent
# 0 GO:0000015 GO:0044445
# 1 GO:0000110 GO:0044428
# 2 GO:0000111 GO:0044428
Note that the first time a godb
function is used, the gene ontology datafile will be downloaded and this may take some time. If you want to display a warning message, you need to set the logging level to INFO
.
import logging
logging.basicConfig(level=logging.INFO)
Install
pip install godb
Requirements
joblib
and pandas
, both of which are automatically installed when using pip to install godb.
TODOs
- (Possibly) Expose a command line interface similar to that of
kg
andbiomartian
. Do not useGO
enough to warrant it yet, though.
Contribute
Report bugs, ask questions or request features at the issues page.
FAQ
How do I get the genes associated with a term?
biomartian -d rnorvegicus_gene_ensembl -i external_gene_name -o go_id | shuf -n 10
Lpcat1 GO:0005509
Klb GO:0005975
LOC498555 GO:0003735
Map3k12 GO:0046777
Hoxb1 GO:0045944
Cir1 GO:0006397
Rhoc GO:0005525
Casr GO:0060613
Cib1 GO:1900026
Onecut1 GO:0002064
See biomartian for more info.
Inspiration
R's go.db
package.