How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?
1
0
Entering edit mode
4.6 years ago

How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?

the deg gene list from seurat

cluster gene
z1  x1  Id2
z2  x1  Fabp3
z3  x1  Id1
z4  x1  Bex1
z5  x1  Bex3
z6  x1  Tceal9
z14 x2  Id2
z15 x2  Fabp3
z16 x2  Id1
z17 x2  Bex1
z18 x2  Bex3
z19 x2  Tceal9
z20 x2  Ddx39
z21 x2  Prdx2
z29 x2  Sdhb
z30 x3  Mybbp1a
z31 x3  Nsmce1
z32 x3  Ubl4a
z33 x3  Clint1

gene list input to clusterProfiler compareCluster

x1       x2 x3
Id2      Id2    Mybbp1a
Fabp3   Fabp3   Nsmce1
Id1       Id1   Ubl4a
Bex1      Bex1  Clint1
Bex3      Bex3  
Tceal9   Tceal9 
         Ddx39  
         Prdx2  
         Sdhb

Thank you very much.

gene RNA-Seq • 8.2k views
ADD COMMENT
0
Entering edit mode

the deg gene list from seurat

 cluster    gene

z1  x1  Id2

z2  x1  Fabp3

z3  x1  Id1

z4  x1  Bex1

z5  x1  Bex3

z6  x1  Tceal9

z14 x2  Id2

z15 x2  Fabp3

z16 x2  Id1

z17 x2  Bex1

z18 x2  Bex3

z19 x2  Tceal9

z20 x2  Ddx39

z21 x2  Prdx2

z29 x2  Sdhb

z30 x3  Mybbp1a

z31 x3  Nsmce1

z32 x3  Ubl4a

z33 x3  Clint1
ADD REPLY
0
Entering edit mode

gene list input to clusterProfiler compareCluster

x1 x2 x3

Id2 Id2 Mybbp1a

Fabp3 Fabp3 Nsmce1

Id1 Id1 Ubl4a

Bex1 Bex1 Clint1

Bex3 Bex3

Tceal9 Tceal9

         Ddx39

        Prdx2   

        Sdhb
ADD REPLY
3
Entering edit mode
3.9 years ago
Pratik ★ 1.1k

Sorry for lateness,

I wanted to do something similar. This is what I did for reference:

Using a Seurat generated gene list for input into ClusterProfiler to see the GO or KEGG terms per cluster.

I'll keep the meat and potatoes of the Seurat vignette in this tutorial:

library(dplyr)
library(Seurat)
library(patchwork)

# Load the PBMC dataset
SeuratData::InstallData("pbmc3k")
pbmc <- SeuratData::LoadData("pbmc3k")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
#Without graph.name argument
pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 1, resolution = 0.5)
#pbmc <- FindNeighbors(pbmc, dims = 1:10)
#pbmc <- FindClusters(pbmc, algorithm = 1, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:10)
DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)

enter image description here

pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.1, logfc.threshold = 0.25)
pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_log2FC)

##################################################################
#Subsetting top 100 markers with adjusted p values lower than .05#
##################################################################
top100 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 100, wt = avg_log2FC)
top100pval <- subset(top100, rowSums(top100[5] < 0.05) > 0)

Now for that you have your dataframe with top ~100 genes per cluster with a pvalue adjusted of lower than .05. We can bring this to ClusterProfiler.

#ClusterProfiler
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")


BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("AnnotationHub")
library("clusterProfiler")
library("org.Hs.eg.db")
library("AnnotationHub")

df <- top100pval[,7:6]
dfsample <- split(df$gene,df$cluster)
length(dfsample)

#The output of length(dfsample) returns how many clusters you have
#Here there at 9 clusters (0, 1, 2, 3, 4, 5, 6, 7 and 8)
#I'm sure there's a better way but you have to make a line like below for each cluster

dfsample$`0` = bitr(dfsample$`0`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`1` = bitr(dfsample$`1`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`2` = bitr(dfsample$`2`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`3` = bitr(dfsample$`3`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`4` = bitr(dfsample$`4`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`5` = bitr(dfsample$`5`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`6` = bitr(dfsample$`6`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`7` = bitr(dfsample$`7`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
dfsample$`8` = bitr(dfsample$`8`, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")


#do the same here, a line like below for each cluster
genelist <- list("0" = dfsample$`0`$ENTREZID, 
                 "1" = dfsample$`1`$ENTREZID,
                 "2" = dfsample$`2`$ENTREZID,
                 "3" = dfsample$`3`$ENTREZID,
                 "4" = dfsample$`4`$ENTREZID,
                 "5" = dfsample$`5`$ENTREZID,
                 "6" = dfsample$`6`$ENTREZID,
                 "7" = dfsample$`7`$ENTREZID,
                 "8" = dfsample$`8`$ENTREZID

)
GOclusterplot <- compareCluster(geneCluster = genelist, fun = "enrichGO", OrgDb = "org.Hs.eg.db")
dotplot(GOclusterplot)

go

KEGGclusterplot <- compareCluster(geneCluster = genelist, fun = "enrichKEGG")
dotplot(KEGGclusterplot)

kegg

Note: the KEGG plot doesn't show the cluster 8 for some reason. I'm thinking because it wasn't enriched for KEGG terms? Cluster 8 was also a tiny cluster... (See UMAP) Regardless, a quick and dirty way to visualize the enriched GO & KEGG terms per cluster.

Good luck!

ADD COMMENT
0
Entering edit mode

If I am running this on a DEG FindMarkers between two different clusters, but the clusters are not defined from FindMarkers I am just provided with pval, avglog2FC, pct.1&2, and pval_adj, would I just disregard the cluster line of the code? And in addition, the pathways that would appear, these would be pathways that show differential expression between the two different clusters, correct?

Thank you!

ADD REPLY
1
Entering edit mode

would I just disregard the cluster line of the code?

I would have to play around with it to figure out what lines of code you would exactly cut out/modify, but generally yes, and then...

And in addition, the pathways that would appear, these would be pathways that show differential expression between the two different clusters, correct?

Yes to this question as well. If you want, I guess, more resolution into this keeping the following from the Seurat vignette in mind:

avg_log2FC : log fold-change of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.
pct.1 : The percentage of cells where the feature is detected in the first group
pct.2 : The percentage of cells where the feature is detected in the second group

You could subset/filter the log2FC that are positive and extract the features that are more highly expressed in the first group (and more lowly expressed in second group), and then also do this with negative log2fc values in that same data frame to get lowly expressed in first group (and higher expressed in second group)... the end goal could be two colums in your clusterProflier figure, where one column is pathways that are "up"(more highly expressed in the first group (and more lowly expressed in second group))... and a second column which is pathways that are "down" (lowly expressed in first group (and higher expressed in second group)).


Just an idea. If you are interested in pathways and seeing the interaction between all clusters of a dataset or maybe even as you are doing here two clusters. Check out this package:

https://github.com/saschajung/Intercom

It was used in this paper: https://www.nature.com/articles/s41467-021-23295-6

Here is a sample figure from the paper:

enter image description here

Lastly, I was having trouble with making the matrix...someone helped out here:

How do I make a ComplexHeatmap from this dataframe?

ADD REPLY
0
Entering edit mode

Dear Pratik

Thank you so much for this code! I was wondering if with this there was a way of extracting a list of the top differentially expressed pathways. And then consecutively plot only the desired pathways in a dotplot.

Best Marvin

ADD REPLY

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6