Question

Appropriate gene IDs for enrichment analysis (with clusterProfiler)

0

Entering edit mode

5.4 years ago

ayatrience • 0

Hi, I am trying to do GO&KEGG enrichment analysis using R package, clusterProfiler. I changed gene IDs (ENSEMBL→uniprot) by "bitr" functions for KEGG enrichment analysis. However, "bitr" returned the multiple IDs from single gene sometimes (I changed into ENTREZ id at the same time). I should pick up one ID from multiple IDs returned from single gene, for enrichment analysis, I thought. So my question is ②How people select the appropriate IDs from multiple returns. I need to do it manually by confirming each returned IDs using uniprot website ? (Ex. judging from the annotation score) but this is so hard working. How everyone deal with this problem ?? (Or we don't need to pick up one from single gene in the first place ...?)

rna-seq R gene gene IDs clusterProfiler • 7.0k views

ADD COMMENT • link updated 2.8 years ago by ccfpwll ▴ 10 • written 5.4 years ago by ayatrience • 0

0

Entering edit mode

It seems very nice to pick up one. If I want to refer the uniprot annotation score, we also can write down another script using function in "multiVal". I will follow your script and workflow. Thank you very much !!

ADD REPLY • link 5.4 years ago by ayatrience • 0

score 4 · Accepted Answer · 2019-08-10

4

Entering edit mode

5.4 years ago

Barry Digby ★ 1.3k

Follow the detailed workflow here : https://github.com/twbattaglia/RNAseq-workflow :

# Add ENTREZ ID
results$entrez <- mapIds(x = org.Mm.eg.db,
                     keys = row.names(results),
                     column = "ENTREZID",
                     keytype = "SYMBOL",
                     multiVals = "first")

For starters, don't bother using ENSEMBL to UniProt. In the guide, the user has set

multiVals = "first'

Which means: "This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior." I have seen this used quite a lot in workflows, so assumed it is ok. If you want to set it to something else, check out the MultiVals argument here: https://www.rdocumentation.org/packages/AnnotationDbi/versions/1.30.1/topics/AnnotationDb-objects

(EDIT): When you get a handle of that workflow, move to this one: https://yulab-smu.github.io/clusterProfiler-book/chapter12.html

ADD COMMENT • link 5.4 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

It seems very nice to pick up one. If I want to refer the uniprot annotation score, we also can write down another script using function in "multiVal". I will follow your script and workflow. Thank you very much !!

ADD REPLY • link 5.4 years ago by ayatrience • 0

0

Entering edit mode

Hi, thank you for the answer! Do you by any chance know how to get the version of ensembl annotation from org.Mm.eg.db? This might give a potential issue when my upstream is using a newer genome build (mm39). The snapshot date doesn't tell me the version of ensembl annotation it has.

Sorry, just realized that there's an answer for this in this post.

ADD REPLY • link 2.8 years ago by ccfpwll ▴ 10