I've been using ToppFunn for gene ontologies and it worked great and fast, but it's a blackbox as to how it gets its results. I'm looking for an open-source, R solution and found biomaRt. I have a few qualms with it, largely it doesn't seem very intuitive as to how one finds the information they need. I have a list of genes I'd like to use for a query, and as outputs I would like the gene ontologies that contain these genes, the number of genes from the input list that are in each ontology, and the p-value. Below is how ToppGene looks and the information it gives, which is great. Having p-value is key, I can filter based on significance. Also being able to access which genes from the input are present in each ontology in the sparse matrix is what I want to re-create.
Currently I get a huge list where it each entry matches a gene to an ontology, so each gene has multiple entries with one for each ontology of which it is a member. Is there a way to collapse this output or query better? I would like to create a matrix with each gene as a column and each ontology as a row, values would be 0/1 whether gene is a member of each ontology or not too; so I can do counts and cluster comparisons.
It also can take a while, I'm sure it's possible, but is it easy to download a mart or ensembl to use locally?
library(biomaRt)
mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl")
result <- getBM(attributes=c("illumina_mousewg_6_v2", "go_id", "name_1006"),
filters="illumina_mousewg_6_v2",
values=c("ILMN_2651144", "ILMN_1251419", "ILMN_1214841", "ILMN_1214071",
"ILMN_2930552", "ILMN_1377919", "ILMN_2618176", "ILMN_2526739",
"ILMN_1253182"),
mart=mart)
Ensembl now has a virtual machine available for download otherwise the databases are available for download on the FTP site.
You could post-process the output to get the format you like.
The VM is for the Perl API and contains a working instance of that API that still accesses the main database. It does not contain a local instance of the database, nor a configured instance of biomaRt. The tables that form the gene Mart database can be found here, so you could install these locally and access them.