run gene enrichment analysis using metatranscriptomic data
2
0
Entering edit mode
7.0 years ago

Hi All,

I have metatranscriptomic data (measured from feces sample) and want to perform gene enrichment analysis. The package I am using is clusterProfiler.

I am stuck by some questions for many days. I searched the google but haven't found the answer.

  1. After performing denovo assembly with Trinity, I got around 850,000 unigene with ID like "TRINITY_DN542091_c0_g2". Then I mapped these Trinity ID unigenes to ncbi-protein id. I got around 800,000 ncbi-protein ids. Then I use "mygene" package to convert the ncbi-protein id to entrez gene ids, I only got around 20,000. The result is though there are 3,000 differential expressed genes, only 60 of them have entrez id to perform enrichment analysis. Why only such few ncbi-protein ids were converted?

  2. When I was using these converted entrez gene ids to perform gene enrichment analysis using "clusterProfiler" package, I already input entrez id as characters, but it still said " Expected input gene ID: 284541,5213,29925,25796,3938,10449"

head(geneList)

[1] "5328557" "851620" "31798232" "856371" "854405" "854229"

ekk <- enrichKEGG(gene=geneList,organism = "hsa",pAdjustMethod = "BH",pvalueCutoff=0.01)

--> No gene can be mapped....

--> Expected input gene ID: 284541,5213,29925,25796,3938,10449

--> return NULL...

  1. I also tried to run enrichment analysis using ncbi-proteinid, it also said "Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617". Several of my protein ids are "NP_xxxxxx", most are not (like "CBK82693.1", "WP_026649001.1").

head(prolist)

[1] "CBK82693.1" "CBL23100.1" "WP_025579028.1" "WP_022786881.1" "WP_026649001.1" "CDA70808.1"

ekk <- enrichKEGG(gene=prolist,organism = "hsa",keyType = "ncbi-proteinid", pAdjustMethod = "BH",pvalueCutoff=0.01)

No gene can be mapped....

Expected input gene ID: NP_002617,NP_000498,NP_787082,NP_061948,NP_055056,NP_002617

return NULL...

  1. As metatranscriptomic data is from a micro-environment (feces) rather than a model organism, so which OrgDb should I choose?

Thanks

R gene • 2.4k views
ADD COMMENT
0
Entering edit mode
6.8 years ago
cvu ▴ 180

Did you resolve this problem? which OrgDb can be used?

ADD COMMENT
0
Entering edit mode
6.4 years ago

@flying dutchman I am also facing similar issues when working with GO terms in a metatranscriptome. One of the things I have been wondering is if it makes sense to look for gene set enrichment when working with genes from many different organisms. Are there tools that account for community-level biases when doing gsea? I am working with metatranscriptomes from microorganisms found in insect guts. Please let me know if you have found a solution for your question.

If anyone else in the community can give us input on these questions, please let us know.

ADD COMMENT

Login before adding your answer.

Traffic: 1888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6