Question

Input file for CompareCluster in ClusterProfiler

1

Entering edit mode

5.0 years ago

Genomics ▴ 20

Hello everyone. My questions might be naive but I would really like to know how should i prepare gcSample for CompareCluster. I have 3 analysis results and each of them is having around 2000 genes and I would like to plot keggenrich in one plot. I tried reading a text file x <- read.table("C:/Users/test.txt", header = TRUE) which looks like this

"$1
"91156" "728929" "162699" "84467" "340990" "729355" "653550" "723961" "223075" "283310" "285556"... $2
"1" "29974" "94160" "150000" "140701" "80167" "51099" "10152" "25890" "25" "84448"... $3
"91156" "283463" "100133005" "50509" "340990" "84467" "643834" "7455" "140453" "554225" "643847".... Error in read.table("C:/Users/test.txt", : more columns than column names

comparecluster clusterProfiler • 4.7k views

ADD COMMENT • link updated 11 months ago by b8177 ▴ 10 • written 5.0 years ago by Genomics ▴ 20

1

Entering edit mode

Thank you for the answer. It worked. Thanks alot.

ADD REPLY • link 5.0 years ago by Genomics ▴ 20

1

Entering edit mode

Genomics : If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY • link 5.0 years ago by GenoMax 153k

0

Entering edit mode

Great to know that it helped :)

ADD REPLY • link 5.0 years ago by tpm ▴ 30

score 0 · Answer 1 · 2020-08-16

Hi. Maybe this will in someway help you, I figured to explain from my codes and the idea will weave together at the end. If it will help you feel free to use it.

In the example shown from this code, your A, was my excel sheet with ENTREZID and its corresponding logFC [(fold change)] My initial input would look something like this as the header:

 X          FC
1  galF  0.03933541
2  adhE -0.40788718
3  ribE  0.03910477
4  mlaC -0.18337625
5  cspE  0.02026372
6 insC1  1.27528523

Then afterwards, I run my command like so until I get to this level:

filename <- "Lesson-02/x1.txt"
A <- read.csv(filename, sep="\t", header=T)
head(A)
A
tail(A)
head(A)

geneLisA <- A[,2]
names(geneLisA) <- as.character(A[,1])
geneLisA <- sort(geneLisA, decreasing = TRUE)
head(geneLisA)
tail(geneLisA)

dad1 <- names(geneLisA)
asara1 <- bitr(dad1, fromType="SYMBOL", toType=c("ENTREZID", "REFSEQ"), OrgDb="org.EcK12.eg.db")
dds1=merge(asara1, A, by.x="SYMBOL", by.y="X", all.x=T )
lalala1 <-dds1 %>% select(ENTREZID, FC)
geneQ1<- lalala1[,2]
names(geneQ1) <- as.character(lalala1[,1])
geneQ11 <- sort(geneQ1, decreasing = TRUE)

geneQ111 <- names(geneQ1)
head(geneQ111)

Mind you that I was working with E coli, so if you are working with human you would have to change this part of the code:

OrgDb="org.EcK12.eg.db"

In how I did it, you perform the analysis for each of the different files separately by modifying the code obviously. Eventually, suppose you have 4 samples with these corresponding codes, you will have for example:

geneQ111 
geneQ222
geneQ333
geneQ444

Then after it, you will have to use the List function, like so:

thisstep <- list(TP1=geneQ111,TP2=geneQ222,TP3=geneQ333, TP4=geneQ444)
lapply(thisstep , head)
lapply(thisstep , tail)
head(thisstep)

Then when you have computed this function, you can then run an gene set enrichment analysis of your choice. Then you can then run CompareCluster function afterwards, like so:

ckk4<- compareCluster(geneCluster = thisstep , fun = "enrichGO", OrgDb = "org.EcK12.eg.db",ont="BP")
summary(ckk4)
dotplot(ckk4)+ ggtitle("Enrichment Analysis - Biological Processes Between Gene Sets")
p4 <- emapplot(ckk4)
p44 <- emapplot(ckk4,legend_n=0.5) 
p444 <- emapplot(ckk4,pie="count")
p4444 <- emapplot(ckk4,pie="count", pie_scale=0.59, layout="kk")
cowplot::plot_grid(p4444)

From this you can change the:

ont="BP"

to

ont="CC"
ont="MF"

Maybe its better to have other packages installed (they maybe too redundant though, but well, you can remove unnecessary ones when you feel like) :

library(clusterProfiler)
library(AnnotationHub)
library(GO.db)
library(GOSemSim)
library(ggplot2)
library(org.EcK12.eg.db)
keytypes(org.EcK12.eg.db)
library(AnnotationHub)
library(devtools)
library(dplyr)
library(enrichMap)
library(DOSE)
library(enrichplot)

Good luck!