Entering edit mode
2.6 years ago
biology_inform
▴
50
Hi all, I am using clusterprofiler for gsea and I am stuck in the OrgDb part. When I am performing my RNA-seq and DGE analysis, I used a gtf file from gencode (Grch38.p13) and my gene ids are "ENSG00000000003.15" like this. As in tutorials
organism = "org.Hs.eg.db"
BiocManager::install(organism, character.only = TRUE)
library(organism, character.only = TRUE)
I saw like above when organism info is parsed. But I think it is for hg19 (different sources) and I cannot run GSEA with that organism db. How and where can I find org db file which is from Gencode? Thanks in advance
Your gene IDs are formatted as versioned ENSEMBL IDs. If you remove the trailing period and numbers (
.15
in your example) you'll have the regular ENSEMBL IDs which should be present in the Org DB. If you post your code and the current error we can give more specific advice.I don't think it's hg19 as the package always fetches current information from NCBI/Ensembl (GRCh38). It would have helped if there is a way to print sources directly within the package. Closest I see is "org.Hs.eg_dbInfo". In general, it takes information from here: https://ftp.ncbi.nlm.nih.gov/gene/DATA and if you look at this file: https://ftp.ncbi.nlm.nih.gov/gene/DATA/README_ensembl, NCBI assembly version GRCh38.p14 and ensembl assembly version is GRCh38.p13.