Input file for CompareCluster in ClusterProfiler
1
1
Entering edit mode
4.3 years ago
Genomics ▴ 20

Hello everyone. My questions might be naive but I would really like to know how should i prepare gcSample for CompareCluster. I have 3 analysis results and each of them is having around 2000 genes and I would like to plot keggenrich in one plot. I tried reading a text file x <- read.table("C:/Users/test.txt", header = TRUE) which looks like this

"$1
"91156" "728929" "162699" "84467" "340990" "729355" "653550" "723961" "223075" "283310" "285556"... $2
"1" "29974" "94160" "150000" "140701" "80167" "51099" "10152" "25890" "25" "84448"... $3
"91156" "283463" "100133005" "50509" "340990" "84467" "643834" "7455" "140453" "554225" "643847".... Error in read.table("C:/Users/test.txt", : more columns than column names

comparecluster clusterProfiler • 3.8k views
ADD COMMENT
1
Entering edit mode

Thank you for the answer. It worked. Thanks alot.

ADD REPLY
1
Entering edit mode

Genomics : If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Great to know that it helped :)

ADD REPLY
0
Entering edit mode
4.3 years ago
tpm ▴ 30

Hi. Maybe this will in someway help you, I figured to explain from my codes and the idea will weave together at the end. If it will help you feel free to use it.

In the example shown from this code, your A, was my excel sheet with ENTREZID and its corresponding logFC [(fold change)] My initial input would look something like this as the header:

 X          FC
1  galF  0.03933541
2  adhE -0.40788718
3  ribE  0.03910477
4  mlaC -0.18337625
5  cspE  0.02026372
6 insC1  1.27528523

Then afterwards, I run my command like so until I get to this level:

filename <- "Lesson-02/x1.txt"
A <- read.csv(filename, sep="\t", header=T)
head(A)
A
tail(A)
head(A)

geneLisA <- A[,2]
names(geneLisA) <- as.character(A[,1])
geneLisA <- sort(geneLisA, decreasing = TRUE)
head(geneLisA)
tail(geneLisA)

dad1 <- names(geneLisA)
asara1 <- bitr(dad1, fromType="SYMBOL", toType=c("ENTREZID", "REFSEQ"), OrgDb="org.EcK12.eg.db")
dds1=merge(asara1, A, by.x="SYMBOL", by.y="X", all.x=T )
lalala1 <-dds1 %>% select(ENTREZID, FC)
geneQ1<- lalala1[,2]
names(geneQ1) <- as.character(lalala1[,1])
geneQ11 <- sort(geneQ1, decreasing = TRUE)

geneQ111 <- names(geneQ1)
head(geneQ111)

Mind you that I was working with E coli, so if you are working with human you would have to change this part of the code:

OrgDb="org.EcK12.eg.db"

In how I did it, you perform the analysis for each of the different files separately by modifying the code obviously. Eventually, suppose you have 4 samples with these corresponding codes, you will have for example:

geneQ111 
geneQ222
geneQ333
geneQ444

Then after it, you will have to use the List function, like so:

thisstep <- list(TP1=geneQ111,TP2=geneQ222,TP3=geneQ333, TP4=geneQ444)
lapply(thisstep , head)
lapply(thisstep , tail)
head(thisstep)

Then when you have computed this function, you can then run an gene set enrichment analysis of your choice. Then you can then run CompareCluster function afterwards, like so:

ckk4<- compareCluster(geneCluster = thisstep , fun = "enrichGO", OrgDb = "org.EcK12.eg.db",ont="BP")
summary(ckk4)
dotplot(ckk4)+ ggtitle("Enrichment Analysis - Biological Processes Between Gene Sets")
p4 <- emapplot(ckk4)
p44 <- emapplot(ckk4,legend_n=0.5) 
p444 <- emapplot(ckk4,pie="count")
p4444 <- emapplot(ckk4,pie="count", pie_scale=0.59, layout="kk")
cowplot::plot_grid(p4444)

From this you can change the:

ont="BP"

to

ont="CC"
ont="MF"

Maybe its better to have other packages installed (they maybe too redundant though, but well, you can remove unnecessary ones when you feel like) :

library(clusterProfiler)
library(AnnotationHub)
library(GO.db)
library(GOSemSim)
library(ggplot2)
library(org.EcK12.eg.db)
keytypes(org.EcK12.eg.db)
library(AnnotationHub)
library(devtools)
library(dplyr)
library(enrichMap)
library(DOSE)
library(enrichplot)

Good luck!

ADD COMMENT
0
Entering edit mode

Hello! I am also new to using compareCluster and just want to double check - the input for compareCluster can be DESeq2 results with both the log2fc and Entrez id ... is that correct? And you basically just order the log2fc and Entrez id in descending order (from greatest to lowest log2fc)?

So, say, I have 4 DESeq results/comparisons: Treatment A vs. control, Treatment B vs. control, Treatment C vs. control, Treatment D vs. control

For compareCluster, I would: 1) take the DESeq results for each comparison and convert gene id/symbol to Entrez id, 2) extract entrez id and log2fc values and order them in descending order and 3) save as a list (I would then have, for example: GeneListA, GeneListB, GeneListC and GeneListD). Then, I would save these 4 gene lists into a single list (ex: myGeneList) and use it as input for compareCluster?

ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6