Hi,
I have obtained gene sets file (GMT) from both KEGG and MSigDB databases. Each of these files show gene IDs, but I need gene names (symbols). Is there any r function or package that I can use to convert IDs to gene names? At the end, I still need to have this file in GMT format to use in another package.
Thanks.
MSigDB provides GMT files for both IDs and symbols.
Just out of curiosity, which package requires file in GMT format?
Thanks igor. I wanna use CEMiTool. I downloaded MSigDB GMT file with EnrichmentBrowser package for my organism, but it only gives entrez IDs.
EnrichmentBrowser uses msigdbr and KEGGREST to get MSigDB and KEGG pathways, respectively. You should be able to use those packages directly to get the gene sets with gene symbols. Keep in mind, MSigDB pathways are based on human, mouse, or rat studies.
CEMiTool does not require a GMT file. It has the
read_gmt()
function to convert a GMT file to a list. You just need to have your gene sets in the same format as whatread_gmt()
returns.You can download the signatures manually with Entrez IDs or HGNC symbols: https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#C2
I have not used EnrichmentBrowser
Thanks Kevin. Initially, I downloaded from the website, but after ORA analysis, I saw many irrelevant pathways to my organism of interest (S. cerevisiae) appeared (e.g. oncogenic signatures). I couldn't find organism-specific gene sets to download from the database website and found it very human-based, so used that package. Please correct me if I think wrongly. Thank you