Hello everybody, I am pretty new to the bioinformatic world and I would really appreciate any advice regarding this issue (and how to properly look for help). So, I have a list of genes from an RNA-Seq experiment in EntrezID that I need to convert to Ensembl Id. I am using annotationDbi with both EnsDb.Hsapiens.v86 and org.Hs.eg.db but I get #NA values for pseudogenes and snoRNA whenever I run the code below. Is there a better way of doing this? By looking online it seems that it is a frequent issue, but it should be able to be solved as I checked several of the unmapped genes and they have Ensembl IDs assigned to them. Thanks in advance!
EnsDb2 <- AnnotationDbi::mapIds(EnsDb.Hsapiens.v86,
keys = Data$gene_id,
column = "GENEID",
keytype = "ENTREZID",
multiVals="first")
orgDb_mapID <- AnnotationDbi::mapIds(org.Hs.eg.db,
keys = Data$gene_id,
column = "ENSEMBL",
keytype = "ENTREZID")
Session info()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.14.0 AnnotationFilter_1.14.0 GenomicFeatures_1.42.3
[5] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 xlsx_0.6.5 org.Hs.eg.db_3.12.0
[9] AnnotationDbi_1.52.0 IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0
[13] BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lattice_0.20-41 prettyunits_1.1.1
[4] Rsamtools_2.6.0 xlsxjars_0.6.1 Biostrings_2.58.0
[7] assertthat_0.2.1 utf8_1.2.1 BiocFileCache_1.14.0
[10] R6_2.5.0 RSQLite_2.2.6 httr_1.4.2
[13] pillar_1.6.0 zlibbioc_1.36.0 rlang_0.4.10
[16] progress_1.2.2 lazyeval_0.2.2 curl_4.3
[19] rstudioapi_0.13 blob_1.2.1 Matrix_1.3-2
[22] BiocParallel_1.24.1 stringr_1.4.0 ProtGenerics_1.22.0
[25] RCurl_1.98-1.3 bit_4.0.4 biomaRt_2.46.3
[28] DelayedArray_0.16.3 compiler_4.0.3 rtracklayer_1.50.0
[31] pkgconfig_2.0.3 askpass_1.1 openssl_1.4.3
[34] tidyselect_1.1.0 SummarizedExperiment_1.20.0 tibble_3.1.1
[37] GenomeInfoDbData_1.2.4 matrixStats_0.58.0 XML_3.99-0.6
[40] fansi_0.4.2 crayon_1.4.1 dplyr_1.0.5
[43] dbplyr_2.1.1 GenomicAlignments_1.26.0 bitops_1.0-6
[46] rappdirs_0.3.3 grid_4.0.3 lifecycle_1.0.0
[49] DBI_1.1.1 magrittr_2.0.1 stringi_1.5.3
[52] cachem_1.0.4 XVector_0.30.0 xml2_1.3.2
[55] ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.7
[58] tools_4.0.3 bit64_4.0.5 glue_1.4.2
[61] purrr_0.3.4 hms_1.0.0 MatrixGenerics_1.2.1
[64] fastmap_1.1.0 BiocManager_1.30.12 memoise_2.0.0
[67] rJava_0.9-13
Hi Dante,
In this setting I suggest you to use the
biomaRt
to solve your issue. Through the vignette you will find a nice explanation about the usage of the package. Make sure that you will use the lastest version of Ensembl annotation.Best regards!
Cross-posted: https://support.bioconductor.org/p/9136520/