How to cluster genes in heatmap
2
2
Entering edit mode
7.0 years ago
Mehmet ▴ 820

Dear All,

I have a data matrix that has 17 samples and over 800 genes that belong to ten different gene families. I want to show these gene families in heatmap by marking them.

I performed heatmap but I do not know how to show gene families in the heatmap graph.

Anyone knows how to that?

RNA-Seq R gene • 18k views
ADD COMMENT
2
Entering edit mode

try annotation_rows in pheatmap.

ADD REPLY
1
Entering edit mode

a very good example of a clustered heatmap

ADD REPLY
8
Entering edit mode
7.0 years ago

Yes, as per Sean, in ComplexHeatmap, you can segregate your heatmap into 'blocks' of different genes using the 'split' parameter. You can end up with nice heatmaps like this: A: How to plot a heatmap with two different distance matrices for X and Y

Edit April 16, 2019: skip to the working example, here: C: how to cluster genes in heatmap

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Here is my data:

EffectorName    GeneID  Sample1      Sample2     Sample3    Sample 4 .... Sample 17
GH45            Gene1   25.7847      19.710      22.6148    ....          .......
Expansin        Gene2   29.2436      29.2168     963.745    ......        .......

.......................................................................................................................................................

What I want to do is to show EffectorNames in the heatmap.

ADD REPLY
0
Entering edit mode

Sorry, it's not clear what you want to do...

If I have this data:

Family      Gene  Sam1 Sam2 Sam3 Sam4
ncRNA       Gene1 10   11   6    1
ncRNA       Gene2 7    6    7    33
pseudogene  Gene3 6    65   3    3
ncRNA       Gene4 10   11   6    1
ncRNA       Gene5 7    6    7    33
pseudogene  Gene6 6    65   3    3

For ComplexHeatmap, if I want to break the heatmap by gene family, I would supply the 'Family' column to the split parameter of the Heatmap function in ComplexHeatmap. This would then break up the heatmap and perform clustering independently for genes under ncRNA and pseudogenes.

ADD REPLY
3
Entering edit mode

R code:

test=read.csv("file.txt", sep="\t", header=T)
rownames(test)=test[,2]
chars=test[,c(1,2)]
test1=test[,c(3:6)]
pheatmap(as.matrix(test1), scale = "row", clustering_distance_rows = "correlation", clustering_method = "complete",color =rainbow(2), main="Significant genes", fontsize_col=24, fontsize_row = 24,annotation_row = chars[1])

input:

$ cat file.txt 
Family  Gene    Sam1    Sam2    Sam3    Sam4
ncRNA   Gene1   10  11  6   1
ncRNA   Gene2   7   6   7   33
pseudogene  Gene3   6   65  3   3
ncRNA   Gene4   10  11  6   1
ncRNA   Gene5   7   6   7   33
pseudogene  Gene6   6   65  3   3

Rplot

ADD REPLY
0
Entering edit mode

Hi ,

I am trying to run, but I got errors:

Error in seq.default(-m, m, length.out = n + 1) : 
  'from' must be a finite number
In addition: Warning messages:
1: In min(x, na.rm = T) : no non-missing arguments to min; returning Inf
2: In max(x, na.rm = T) : no non-missing arguments to max; returning -Inf
ADD REPLY
3
Entering edit mode

Hi! For ComplexHeatmap, try this code (note the split parameter):

require(ComplexHeatmap)
require(circlize)
require(cluster)

df <- read.table("test", header=TRUE)
df
      Family  Gene Sam1 Sam2 Sam3 Sam4
1      ncRNA Gene1   10   11    6    1
2      ncRNA Gene2    7    6    7   33
3 pseudogene Gene3    6   65    3    3
4      ncRNA Gene4   10   11    6    1
5      ncRNA Gene5    7    6    7   33
6 pseudogene Gene6    6   65    3    3
7 pseudogene Gene7    5   45    2    1

heat <- t(scale(t(df[,3:ncol(df)])))

hmap <- Heatmap(heat,
        name="Transcript Z-score",
        #col=colorRamp2(myBreaks, myCol),
        heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),
        split=df$Family,
        row_title="Transcript class",
        row_title_side="left",
        row_title_gp=gpar(fontsize=15, fontface="bold"),
        show_row_names=TRUE,
        column_title="",
        column_title_side="top",
        column_title_gp=gpar(fontsize=15, fontface="bold"),
        column_title_rot=0,
        show_column_names=TRUE,
        clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
        clustering_method_columns="ward.D2",
        clustering_distance_rows="euclidean",
        clustering_method_rows="ward.D2",
        row_dend_width=unit(30,"mm"),
        column_dend_height=unit(30,"mm"))

draw(hmap, heatmap_legend_side="left")

Captura_de_tela_de_2017_11_28_10_48_38

ADD REPLY
0
Entering edit mode

Hi Kevin,

Thank you. I followed but I received this error;

Error in colorRamp2(myBreaks, myCol) : 
  Length of `breaks` should be equal to `colors`.
ADD REPLY
1
Entering edit mode

Yes, I commented out that part of the code. If you would like to use it, then execute the following prior to generating the heatmap:

#myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=100)

You can choose any colours that you want here.

Also note that the t( scale( t( x ) ) ) function is scaling the data to Z-scores.

ADD REPLY
0
Entering edit mode

Hi Kevin,

Thank you very much for your help. I was able to generate a heatmap as I wanted.

ADD REPLY
2
Entering edit mode

Great. You should devote a full working day to looking over ComplexHeatmap. Once you learn it, you will never then go back to heamap.2 or pheatmap.

ADD REPLY
1
Entering edit mode

Yes I will. Your help to this post is a tutorial for other people, so anyone can follow these steps easily to make a complex heatmap.

ADD REPLY
0
Entering edit mode

One thing I want to ask; how to change position of Family names in the heatmap? They are positioned as vertical, but I want to show them as vertical. because it is not possible to see as some of them are overlapped.

ADD REPLY
4
Entering edit mode

Take a look at this (below). I now add all sorts of annotations for you, just to give you an idea. Also note the following:

  • I set the orientation/rotation of the family names with row_title_rot=0 (note that when you use the split parameter, it overrides the row title)
  • I now set gene names as rownames for heat, with rownames(heat) <- df$Gene
  • I use different distance metrics for rows and columns, with clustering_distance_columns and clustering_distance_rows

This is a 'simple' ComplexHeatmap though, if that makes sense. There is much more to ComplexHeatmap, and you don't even want to see the complexity of one of the recent ones that I made. The code for it runs into the hundreds of lines. I commend the author of the package, who did a really great job.


require(ComplexHeatmap)
require(circlize)
require(cluster)

df <- read.table("test", header=TRUE)
df
      Family  Gene Sam1 Sam2 Sam3 Sam4
1      ncRNA Gene1   10   11    6    1
2      ncRNA Gene2    7    6    7   33
3 pseudogene Gene3    6   65    3    3
4      ncRNA Gene4   10   11    6    1
5      ncRNA Gene5    7    6    7   33
6 pseudogene Gene6    6   65    3    3
7 pseudogene Gene7    5   45    2    1

heat <- t(scale(t(df[,3:ncol(df)])))

rownames(heat) <- df$Gene

#Set annotation
  ColAnn <- data.frame(colnames(heat))
  colnames(ColAnn) <- c("Sample")
  ColAnn <- HeatmapAnnotation(df=ColAnn, which="col")

  RowAnn <- data.frame(df$Family)
  colnames(RowAnn) <- c("Gene family")
  colours <- list("Gene family"=c("ncRNA"="royalblue","pseudogene"="red3"))
  RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row")

  boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm"))

  boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm"))


hmap <- Heatmap(heat,
        name="Transcript Z-score",
        col=colorRamp2(myBreaks, myCol),
        heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),

       #Split heatmap rows by gene family
        split=df$Family,

        #Row annotation configurations
        cluster_rows=TRUE,
        show_row_dend=TRUE,
        #row_title="Transcript", #overridden by 'split' it seems
        row_title_side="left",
        row_title_gp=gpar(fontsize=15, fontface="bold"),
        show_row_names=TRUE,
        row_names_side="left",
        row_title_rot=0,

        #Column annotation configuratiions
        cluster_columns=TRUE,
        show_column_dend=TRUE,
        column_title="Samples",
        column_title_side="top",
        column_title_gp=gpar(fontsize=15, fontface="bold"),
        column_title_rot=0,
        show_column_names=TRUE,

        #Dendrogram configurations: columns
        clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
        clustering_method_columns="ward.D2",
        column_dend_height=unit(30,"mm"),

        #Dendrogram configurations: rows
        clustering_distance_rows="euclidean",
        clustering_method_rows="ward.D2",
        row_dend_width=unit(30,"mm"),

        #Annotations (row annotation must be added with 'draw' function, below)
        top_annotation_height=unit(0.5,"cm"),
        top_annotation=ColAnn,

        bottom_annotation_height=unit(3, "cm"),
        bottom_annotation=boxAnnCol)

draw(hmap + RowAnn + boxAnnRow, heatmap_legend_side="left", annotation_legend_side="right")

Captura_de_tela_de_2017_11_28_14_30_00

ADD REPLY
1
Entering edit mode

Hi kevin Im using your code..have a look I m getting some error

require(ComplexHeatmap)
require(circlize)
require(cluster)
df <- read.csv('PATHWAY_gene.txt', header=TRUE,sep = "\t")
df
dim(df)
names(df)
heat <- t(scale(t(df[,3:ncol(df)])))

#################################################



##############################################


rownames(heat) <- df$Gene

myCol <- colorRampPalette(c("navyblue", "white", "red"))(100)
myBreaks <- seq(-2,2, length.out=100)
#Set annotation
ColAnn <- data.frame(colnames(heat))
colnames(ColAnn) <- c("Sample")
ColAnn <- HeatmapAnnotation(df=ColAnn, which="col")

RowAnn <- data.frame(df$Family)
colnames(RowAnn) <- c("Gene family")
colours <- list("Gene family"=
                  c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
                   "Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))
RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row")

boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm"))

boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"),pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm"))


hmap <- Heatmap(heat,
                name="Transcript Z-score",
                col=colorRamp2(myBreaks, myCol),
                heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),

                #Split heatmap rows by gene family
                split=df$Family,

                #Row annotation configurations
                cluster_rows=FALSE,
                show_row_dend=FALSE,
                #row_title="Transcript", #overridden by 'split' it seems
                row_title_side="left",
                row_title_gp=gpar(fontsize=30, fontface="bold"),
                show_row_names=TRUE,
                row_names_side="left",
                row_title_rot=0,

                #Column annotation configuratiions
                cluster_columns=TRUE,
                show_column_dend=TRUE,
                column_title="Samples",
                column_title_side="top",
                column_title_gp=gpar(fontsize=15, fontface="bold"),
                column_title_rot=0,
                show_column_names=TRUE,

                #Dendrogram configurations: columns
                #clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
                clustering_method_columns="complete",
                column_dend_height=unit(10,"mm"),

                #Dendrogram configurations: rows
                clustering_distance_rows="euclidean",
                clustering_method_rows="ward.D2",
                row_dend_width=unit(30,"mm"))

                #Annotations (row annotation must be added with 'draw' function, below)
                #top_annotation_height=unit(0.5,"cm"),
                #top_annotation=ColAnn)

                #bottom_annotation_height=unit(3, "cm"),
                #bottom_annotation=boxAnnCol)

draw(hmap + RowAnn , heatmap_legend_side="left", annotation_legend_side="right")






 Error in .local(object, ...) : 
      Gene family: cannot map colors to some of the levels:
    Activation of IRF by Cytosolic Pattern Recognition Receptor

Error when drawing annotation 'Gene family'
Error in .local(object, ...) : Error in .local(object, ...) : 
  Gene family: cannot map colors to some of the levels:
Activation of IRF by Cytosolic Pattern Recognition Receptors

Im not sure what is going wrong with the color defined

ADD REPLY
1
Entering edit mode
colours <- list("Gene family"=
                  c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
                   "Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))

Hello friend, the problem is most likely in the line above. Can you double-check that all gene family names are correct, including upper- and lower-case

ADD REPLY
1
Entering edit mode

okay let me see that again

ADD REPLY
0
Entering edit mode

Im getting something like this

Since I think gene name was kind of cluttering it i removed but still its kind of messed up any suggestion

ADD REPLY
0
Entering edit mode

My next edcated guess is that the problem is with your hyphens. For example, "NF-κB Signaling" will have to be changed to "NF κB Signaling".

ADD REPLY
0
Entering edit mode

and how do you decide the sequence break may be i m doing something wrong because the number of family is around 15 and in my figure I can't see the z score bar , as well

ADD REPLY
1
Entering edit mode

It is trial and error. You can try myBreaks <- seq(-2,2, length.out=100) or myBreaks <- seq(-1,1, length.out=100), or something else. Your data does look strange (very flat).

Keep in mind that you do not necessarily have to scale the data and set break-points. Also remember that the colouring is purely for visualisation and does not change the actual clustering.

ADD REPLY
0
Entering edit mode

yes colouring to distinguish, why the z score map can't be seen is it because of the text which is taking all the space on the left side...

ADD REPLY
0
Entering edit mode

Hi Kevin,

I would like to ask you something. How can I add FPKM values of each gene in each sample into heatmap?

ADD REPLY
0
Entering edit mode

Hello again. Do you mean to literally add the numerical FPKM values to the heatmap?

ADD REPLY
0
Entering edit mode

Hi Kevin,

I figured out how to add FPKM values in cells of heatmap. But I also need to add a box plot of FPKM values in addition to z-score box plot. I tried but I could not see another legend option.

ADD REPLY
1
Entering edit mode

To do that, I think that you just create a HeatmapAnnotation and specify 2 boxplots in it, like this:

annotBoxplots <- HeatmapAnnotation(anno_boxplot(zscores, which = "row"), anno_boxplot(fpkm, which="row"), which="row", ...)
ADD REPLY
0
Entering edit mode

Hi Kevin,

I run the command below;

annotBoxplots <- HeatmapAnnotation(as.matrix(shorteffec.fpkm.txt), which="row")

I was wondering how to show only FPKM box plot (without annotation), not z-score boxplot.

ADD REPLY
1
Entering edit mode

Oh, maybe try this:

annotBoxplots < HeatmapAnnotation(anno_boxplot(as.matrix(shorteffec.fpkm.txt)), which="row")
ADD REPLY
1
Entering edit mode

Hi Kevin,

I was able to show FPKM box plot of rows (genes) and columns (samples/conditions) in the heatmap without annotation.

ADD REPLY
0
Entering edit mode

Hi Kevin,

I was able to produce annotated box plot based on FPKM values in the heatmap.

I mean this is the command that I added to Heatmap function and it put FPKM values in cell in the heatmap:

cell_fun = function(j, i, x, y, width, height, fill) {grid.text(sprintf("%.1f", shorteffec.fpkm.txt[i, j]), x, y, gp = gpar(fontsize = 15, col= "black"))}

What I need to do is to show box plot of FPKM values without any annotation. and not to use z-score legend in the heatmap.

ADD REPLY
0
Entering edit mode

Sorry Kevin, by mingling your code and complexheatmap option to keep genes in same order in two heat maps, I have this heat map. Now, how I can make the right heat map with more smooth coloring, I mean left heat map is darker and right one is higher.

library(ComplexHeatmap)
library(circlize)
mycol <- colorRamp2(c(-2,0,2), c("dodgerblue", "black", "yellow"))
> heat <- t(scale(t(norm_h0_t_r)))
> heat <- heat[apply(heat, MARGIN = 1, FUN = function(x) sd(x) != 0),]
> View(heat)
> t=heat[,1:2]
> r=heat[,3:4]
> dim(t)
[1] 8587    2
> dim(r)
[1] 8587    2
> Heatmap(t, col=mycol, cluster_columns = FALSE) + Heatmap(r, col=mycol, cluster_columns = FALSE)

May be same scale on both heat map

![enter image description here][1]

ADD REPLY
0
Entering edit mode

Please do not post your question in multiple places.

ADD REPLY
0
Entering edit mode

I have replied back in the other thread in order to maintain consistency: C: Why can't I reproduce the same heat map

ADD REPLY
0
Entering edit mode

How to deal with these?

ADD REPLY
0
Entering edit mode

Hi Kevin,

I am trying to generate heatmap, but I am having difficulties. As you remember from my previous heatmap based on FPKM post, I need to use those data and I need to load data into R, scale, and heatmap.

Could you please send me an R code to do those steps?

ADD REPLY
2
Entering edit mode
7.0 years ago

Take a look at the heatmap.3 and ComplexHeatmap packages to mark your genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6