Dear all,
I have a list of S. cerevisiae genes and I want to do GO enrichment analysis using clusterProfiler. I already obtained some information using enrichGO() and groupGO(), and I want to see what I can obtain with gseGO().
I use the package org.Sc.sgd.db as my organism database. Here is my script:
library(clusterProfiler)
library(DOSE)
library(org.Sc.sgd.db)
gene_list <- "~/pathway/to/my/list.xslx"
gene_ids <- mapIds(org.Sc.sgd.db, keys = genes$GeneID, column = "ENTREZID", keytype = "COMMON")
geneList <- gene_ids[order(gene_ids, decreasing = T)]
data(geneList, package = "DOSE")
gseGO(geneList= geneList, ont = "BP", OrgDb = org.Sc.sgd.db, keyType = "ENTREZID")
And I obtain this error message:
preparing geneSet collections...
--> Expected input gene ID: 855471,855490,850303,855645,851691,854818
Error in check_gene_id(geneList, geneSets) :
--> No gene can be mapped....
I tried to convert the ENTREZID labels in numeric, but it didn't change the results. I tried also with GENENAME instead of ENTREZID, but same. Since I have some NA values, I also tried to remove them, but again, nothing changed.
From the error message, I understand that the command doesn't recognise my IDs as the right format, but for me they are.
Here is an example of my vector in numeric and without the NA values:
gene_ids
[1] 851236 851289 852194 852218 852292 852305 852318 852366 852410 852445 852447 852567 850312
[14] 850377 850398 851359 851404 851430 851667 851697 851698 851713 851727 851746 851762 851781
[27] 851788 851857 851911 851917 851918 852030 852077 852106 852123 856640 856711 856830 856862
[40] 850559 850608 852637 852667 852703 852803 852905 852987 853083 853116 853145 856339 856364
If anyone has some advice of what I could try, it would be really useful for my work.
Thank you, Juliette
To my knowledge geneList from DOSE package contains human genes and cannot be applied to S.cerevisiae GO. This is why you get the error. Create a geneList on the same format with your genes input
Yes, indeed, this was a stupid mistake from me. However, I removed the line : data(geneList, package = "DOSE")
I added more detail in another comment! Thank you for your help !
geneList should be a named vector (look at example from DOSE package) :
The names of the vector you provided should be the ENTREZID of your genes. In your code provided below, it is the inverse, ENTREZID are the values while gene symbol are the names.
names(gene_ids) = gene_ids
should solve the problem (considering you list is already ranked), but pay attention that your list is ranked adequately for GSEAI didn't know the values should be named, thank you a lot!! Now I have others error messages but I will try to fix them by myself. I still have a lot to learn about this package.