Hi everyone,
I have a question concerning complex heatmap. If I try to display the rownames, I am always getting the Ensembl Gene ID. Also , I want to create a Heatmap displaying only specific genes. I tried building a vector. However, that would only of course work if I type in the ENsembl IDs. Here is the code so far, maybe this makes it a bit clearer:
#This is the data.frame for the DEGs, generated from lfcshrink, annotated using biomaRt and 1 to many reationships cleaned
> head(annotT)
GeneID baseMean logFC lfcSE pvalue FDR ENS_ID entrezgene_id external_gene_name entrezgene_accession
1 ENSG00000223972 2.2537507 -0.92184193 1.3249556 0.33362181 0.4261799 ENSG00000223972 100287102 DDX11L1 DDX11L1
2 ENSG00000227232 17.4375811 -0.01509812 0.6232927 0.97844298 0.9848900 ENSG00000227232 NA WASH7P
3 ENSG00000240361 1.0720929 -1.30384232 2.1066946 0.22622144 NA ENSG00000240361 NA OR4G11P
4 ENSG00000238009 0.8069346 -0.61353957 2.1413329 0.52429054 NA ENSG00000238009 NA
5 ENSG00000233750 2.1753908 -2.37493988 1.9108980 0.04824216 NA ENSG00000233750 NA CICP27
6 ENSG00000268903 51.8635638 0.83205006 1.0986041 0.32723714 0.4195377 ENSG00000268903 NA
Okay this is my way to start the HEatmap:
sigGenesT <- annotT %>%
top_n(300, wt=-FDR) %>%
pull("GeneID")
plotDat_heat_T <- vst(ddsObj2)[sigGenesT,] %>%
assay()
z.mat_T <- t(scale(t(plotDat_heat_T), center=TRUE, scale=TRUE))
myPalette <- c("blue3", "ivory", "red3")
myRamp = colorRamp2(c(-2, 0, 2), myPalette)
hcDatT <- hclust(dist(z.mat_T))
cutGroups <- cutree(hcDatT, h=4)
ha1 = HeatmapAnnotation(df = colData(ddsObj2)[,c("cell_group")])
htT<-Heatmap(z.mat_T, name = "z-score",
col = myRamp,
show_row_names = FALSE,
cluster_columns = TRUE,
split=cutGroups,
rect_gp = gpar(col = "darkgrey", lwd=0.5),
top_annotation = ha1)
htT[,9:14]
So as my Deseq object (ddsObj2) was generated with a matrix only containing the ensembl genes (I think that is kind of normal? annotation comes afterwards), I had to use the pulldown function on the GeneID (=ensembl IDs). So i guess the Heatmap only gets the Ensembl ID information? If i use show_row_names=TRUE it only gives the ensembl IDs.
I also tried to build a Heatmap with only specific Genes.
Heatmap with specific genes
spGenes<- c("IL1B", "NLRP3","GSDMD","CASP1","CASP4","CASP5","NLRC4","NLRP1")
plotDat_heat_TS <- vst(ddsObj2)[spGenes,] %>%
assay()
z.mat_TS <- t(scale(t(plotDat_heat_TS), center=TRUE, scale=TRUE))
hcDatTS <- hclust(dist(z.mat_TS))
cutGroupsS <- cutree(hcDatTS, h=4)
htTS<-Heatmap(z.mat_TS, name = "z-score",
col = myRamp,
show_row_names = TRUE,
cluster_columns = TRUE,
split=cutGroupsS,
rect_gp = gpar(col = "darkgrey", lwd=0.5),
top_annotation = ha1)
htTS[,9:14]
Of course, it gives this error message: ... assay': <DESeqTransform>[i,] index out of bounds: IL1B NLRP3 ... NLRC4 NLRP1
Is there a smart way to build a heatmap that contains the gene names instead the ensembl IDs?
Edit: Another problem I did encounter was, that I get a legend called "df" for my annotation, although i specified it as "cell_group".
You can try pheatmap.
And pheatmap auto-magically maps ENSG IDs to HGNC symbols? Please stop recommending software tools that don't do what needs to be done.