Hey,
Cache relates to this parameter of getBM()
:
useCache: Boolean indicating whether the results cache should be used.
Setting to ‘FALSE’ will disable reading and writing of the
cache. This argument is likely to disappear after the cache
functionality has been tested more thoroughly.
It's basically data that is stored on your local drive from when you previously ran biomaRt. It goes without saying that you should restart your R session for every new analysis that you perform in order to clear cache and memory, and avoid re-using old variables that lurk in your workspace..
The problem in this case is that you have mouse gene symbols but are trying to suggest that they are HGNC symbols. HGNC is specific for Homo sapiens (human... us) - you will want MGI (mgi_symbol
):
require(biomaRt)
mart <- useMart('ENSEMBL_MART_ENSEMBL', host = 'useast.ensembl.org')
mart <- useDataset('mmusculus_gene_ensembl', mart)
data <- c('Ank2','Scg2','Nefh','Sgip1','Amph','Srcin1')
mapping <- getBM(
attributes = c('mgi_symbol', 'ensembl_gene_id', 'entrezgene_id'),
filters = 'mgi_symbol',
mart = mart,
values = data,
uniqueRows = TRUE,
bmHeader = T)
mapping
MGI symbol Gene stable ID NCBI gene (formerly Entrezgene) ID
1 Amph ENSMUSG00000021314 218038
2 Ank2 ENSMUSG00000032826 109676
3 Nefh ENSMUSG00000020396 380684
4 Scg2 ENSMUSG00000050711 20254
5 Sgip1 ENSMUSG00000028524 73094
6 Srcin1 ENSMUSG00000038453 56013
Kevin
Thanks Kevin for pointing out the species error. It works fine now. Now my input file has 7289 genes with some duplicates. After conversion
getBM
removed the duplicate id's and returned 4731 id's. I do not want it to get rid of the duplicates as I will be combining the output to my original dataset for further downstream analysis. Is there any way to get around that with getBM?Did you try
uniqueRows = FALSE
? Generally, with biomaRt, extra work is required after you perform the initial mapping. You will note that biomaRt does not even return the genes in the same order in which they were submitFor 1-to-1 mapping, org.Mm.eg.db may be a better option. See step 3, here: https://support.bioconductor.org/p/130727/#130733
uniqueRows = FALSE
doesn't do it either. But yes, AnnotationDbi package provides the output like I want it. Thank you.