Question

Further analysis on the output of the clusterProfiler library in R

0

Entering edit mode

5.8 years ago

peter.berry5 ▴ 60

I have performed a differential protein expression analysis using Progenesis QI and Proteome Discover. The differentially expressed proteins identified were used as the input for the ensemble BioMart and clusterProfiler packages in R to identify enrich pathways in my dataset. The ultimate output of all of this is a file listing the enriched pathways and the genes in those pathways which are present in my original dataset. An example of the data for 2 of the pathways is below

    ID      Description  GeneRatio BgRatio  pvalue p.adjust qvalue  geneID  
   cge01200 Carbon metabolism    42/633 125/9112           8.83E-19 2.58E-16    2.16E-16    100761944/100767893/100763194/100767060/100736557/100758970/100767929/100765199/103158535/100751753/100689437/100771169/100765605/100771246/100689344/100760749/100774853/100755202/100760751/100762838/100771311/100773774/100750732/100689468/100772205/100774058/100773493/100757947/100764352/100770936/100771661/100754714/100689467/100762127/100762297/100765530/100765413/100774097/100759948/100770052/100754996/100764862 

 cge00100   Steroid biosynthesis    7/633   20/9112 0.0002  0.003   0.0029  100767580/100771700/100769920/100754784/100764429/100689192/100753498

what I would like to be able to do is list the genes individually so I can link the geneID with the gene symbol and protein.

The code for the clusterprofiler analysis is

> library(clusterProfiler)
> enrichedpaths_v7 <- enrichKEGG(gene  = v7$NCBI_gene_ID,
>                organism     = 'cge',
>               pvalueCutoff = 0.05)

I tried using the dypr package and separate as follows

> library ("tidyverse")
> library ("dplyr")
> as.character(enrichedpaths_v7$geneID)
> result_KEGG_analysis <- separate(enrichedpaths_v7, 9, into = paste("geneID", 1:50, sep = "/"))

I also tried

> result_KEGG_analysis<- separate (enrichedpaths_v7, 9, into = c("geneID", 50), sep = "9")

In both cases I get the following error

Error in UseMethod("separate_") : 
  no applicable method for 'separate_' applied to an object of class "enrichResult"

all suggestions/solutions gratefully accepted.

Peter

R gene • 2.5k views

ADD COMMENT • link updated 5.8 years ago by GenoMax 151k • written 5.8 years ago by peter.berry5 ▴ 60

0

Entering edit mode

Can you try explicitly mentioning the separate to use (in the style tidyr::separate(...)? Also, there is no need to import dplyr after importing tidyverse, the latter automatically imports the former. Plus, there is no separate() in dplyr.

I think you'd benefit from using tidyr::separate_rows

ADD REPLY • link 5.8 years ago by Ram 45k

0

Entering edit mode

Hi Thanks for that suggestion. It didn't work but it got me thinking about the error message in a different way. I modified the code as follows

enrichedpaths_v7 <- as.data.frame(enrichedpaths_v7)
as.character(enrichedpaths_v7$geneID)
result_KEGG_analysis <- tidyr::separate_rows(enrichedpaths_v7, 9, into = paste("geneID", sep = "/"))

and now I get

Error: All nested columns must have the same number of elements.

My problem is that I can't change the number of elements in each column as this is a direct result of the analysis. Any suggestions?

ADD REPLY • link 5.8 years ago by peter.berry5 ▴ 60

0

Entering edit mode

What does line 2 of your code do?
You're not using tidyr::separate rows() properly. Merely copy-pasting code from one function to another is not how functionality is achieved.

ADD REPLY • link 5.8 years ago by Ram 45k

0

Entering edit mode

Hi

line two changes the column gene ID from a numeric format to a character format.
how would you suggest using tidyr::separate rows().

I am a beginner with using R and bioinformatics and am trying to teach myself. However, it seems to me that I need to overcome the fact that in one row i have 42 genes listed and in a second row have only 7 genes listed.

ADD REPLY • link 5.8 years ago by peter.berry5 ▴ 60

0

Entering edit mode

as.character(enrichedpaths_v7$geneID) converts the geneID vector (column) to character type and returns it to the caller (the R process, which then prints it on the screen because it has not been asked to do anything else with it). To replace the column with the new vector, you need to assign it to the column like so: enrichedpaths_v7$geneID <- as.character(enrichedpaths_v7$geneID). This is a common paradigm of all pass-by-value architectures, in which the function call does not affect the variable being passed but affects just a copy of it. R follows this paradigm as far as I know.
Please read the documentation on tidyr::separate_rows and look at examples to understand how to use it.