Hi All,
I have metatranscriptomic data (measured from feces sample) and want to perform gene enrichment analysis. The package I am using is clusterProfiler.
I am stuck by some questions for many days. I searched the google but haven't found the answer.
After performing denovo assembly with Trinity, I got around 850,000 unigene with ID like "TRINITY_DN542091_c0_g2". Then I mapped these Trinity ID unigenes to ncbi-protein id. I got around 800,000 ncbi-protein ids. Then I use "mygene" package to convert the ncbi-protein id to entrez gene ids, I only got around 20,000. The result is though there are 3,000 differential expressed genes, only 60 of them have entrez id to perform enrichment analysis. Why only such few ncbi-protein ids were converted?
When I was using these converted entrez gene ids to perform gene enrichment analysis using "clusterProfiler" package, I already input entrez id as characters, but it still said " Expected input gene ID: 284541,5213,29925,25796,3938,10449"
head(geneList)
[1] "5328557" "851620" "31798232" "856371" "854405" "854229"
ekk <- enrichKEGG(gene=geneList,organism = "hsa",pAdjustMethod = "BH",pvalueCutoff=0.01)
--> No gene can be mapped....
--> Expected input gene ID: 284541,5213,29925,25796,3938,10449
--> return NULL...
- I also tried to run enrichment analysis using ncbi-proteinid, it also said "Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617". Several of my protein ids are "NP_xxxxxx", most are not (like "CBK82693.1", "WP_026649001.1").
head(prolist)
[1] "CBK82693.1" "CBL23100.1" "WP_025579028.1" "WP_022786881.1" "WP_026649001.1" "CDA70808.1"
ekk <- enrichKEGG(gene=prolist,organism = "hsa",keyType = "ncbi-proteinid", pAdjustMethod = "BH",pvalueCutoff=0.01)
No gene can be mapped....
Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617
return NULL...
- As metatranscriptomic data is from a micro-environment (feces) rather than a model organism, so which OrgDb should I choose?
Thanks