I am working with human and rat gene sets. From a list of ensembl gene IDs, I want to retrieve columns of attributes via biomaRt. With about 4000 genes, the process runs very slowly (30 minutes). I can save the R object and use it for next times. But is there anyway for me to download a whole package of gene annotation information with gen ontology, RFAM, PFAM, Interpro, etc? In particular, I am interested in downloading the following attitutes.
This is a snippet for what I am trying to do:
library(biomaRt)
#Example of 20 gene ids.
ensids <- c(
'ENSRNOG00000000001',
'ENSRNOG00000000009',
'ENSRNOG00000000040',
'ENSRNOG00000000055',
'ENSRNOG00000000082',
'ENSRNOG00000000091',
'ENSRNOG00000000129',
'ENSRNOG00000000137',
'ENSRNOG00000000138',
'ENSRNOG00000000142',
'ENSRNOG00000000156',
'ENSRNOG00000000187',
'ENSRNOG00000000196',
'ENSRNOG00000000231',
'ENSRNOG00000000233',
'ENSRNOG00000000239',
'ENSRNOG00000000277',
'ENSRNOG00000000288',
'ENSRNOG00000000307',
'ENSRNOG00000000321')
m <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "rnorvegicus_gene_ensembl")
enstable <- getBM(mart = m, attributes = c('ensembl_gene_id','gene_biotype',
'external_gene_name', 'superfamily',
'family', 'go_id','goslim_goa_accession',
'rfam', 'pirsf','interpro','tigrfam'),
filters = c('ensembl_gene_id'), values = ensids)
Even though the first time download may take more time, but I see much greater benefits of subsequent uses: leave the ensembl server unstressed with repeated queries, shorter runtime, and internet independent.