Question

heatmap in R

0

Entering edit mode

4.1 years ago

Rob ▴ 170

Hi I have HTseq data and want to plot heatmap for significant expressed genes. I followed Kevin Blighe tutorial :https://github.com/kevinblighe/E-MTAB-6141 I still do not have any pattern. These genes are significantly differentially expressed between two groups (tretment and control) and should have pattern in heatmap. Any body can help me with this? Also, default color range for this code is -4 to 4. How can I change it to for example -3 to 3? this is my code:

require(RColorBrewer)
require(ComplexHeatmap)
require(circlize)
require(digest)
require(cluster)
library(DESeq2)

##obtain the data
data <- read.table("mydata.txt", sep = '\t',
                  header = TRUE, stringsAsFactors = FALSE,check.names=FALSE, row.names = 1)
mat<- as.matrix (data)
####################normalize and log2transform
mat1 <- vst(mat) ##normalization and logtransformation
mat2<- t(scale(t(mat1))) ##zscore of normalized log2 transformed data

metadata <- read.table("samples.txt", sep = '\t', row.names = 1,
                       header = TRUE,check.names=FALSE,stringsAsFactors = FALSE)


sig_genes <- read.table("SigGenes_FDR0.05_FC2.txt", sep = '\t',
                        header = FALSE, stringsAsFactors = FALSE,check.names=FALSE)[,1]


##Check the md5 checksums to ensure data integrity / security. The checksums should be:

digest::digest(mat3, algo = 'md5')
digest::digest(metadata, algo = 'md5')
digest::digest(sig_genes, algo = 'md5')

# --check that both objects are aligned by name
all(rownames(metadata) == colnames(mat2))
# Subset the expression matrix for the statistically significant genes
mat <- mat2[sig_genes, ]

#set colour scheme and choose breaks
myCol <- colorRampPalette(c('dodgerblue', 'black', 'yellow'))(100)
myBreaks <- seq(-3, 4, length.out = 100)
##create annotation: treatment status and sex
#First, we will just generate some colour mappings for the metadata sex
# Sex
cd3 <- metadata$Sex
cd3 <- cd3[!is.na(cd3)] # remove missing values - we don't want to include these in the mapping
pick.col <- brewer.pal(9, 'Greens')
col.cd3 <- colorRampPalette(pick.col)(length(unique(cd3)))


unique(col.cd3)
ann <- data.frame(
  treatmentStatus = metadata$treatmentStatus,
  Sex = metadata$Sex,
  stringsAsFactors = FALSE)

# create the colour mapping
# create the colour mapping
colours <- list(treatmentstatus = c('0' = 'blue', '1' = 'red'),
             Sex = c('M' = "#F7FCF5", 'F' = '#C7E9C0'))

# now create the ComplexHeatmap annotation object
# as most of these parameters are self-explanatory, comments will only appear where needed
colAnn <- HeatmapAnnotation(
  df = ann,
  which = 'col', # 'col' (samples) or 'row' (gene) annotation?
  na_col = 'white', # default colour for any NA values in the annotation data-frame, 'ann'
  col = colours,
  annotation_height = 0.6,
  annotation_width = unit(1, 'cm'),
  gap = unit(1, 'mm'),
  annotation_legend_param = list(
    treatmentStatus = list(
      nrow = 4, # number of rows across which the legend will be arranged
      title =  'treatmentStatus',
      title_position = 'topcenter',
      legend_direction = 'vertical',
      title_gp = gpar(fontsize = 12, fontface = 'bold'),
      labels_gp = gpar(fontsize = 12, fontface = 'bold')),
    Sex = list(
      nrow = 5,
      title = 'Sex',
      title_position = 'topcenter',
      legend_direction = 'vertical',
      title_gp = gpar(fontsize = 12, fontface = 'bold'),
      labels_gp = gpar(fontsize = 12, fontface = 'bold')))
)



##create annotation: box-and-whisker plots
boxplotCol <- HeatmapAnnotation(
  boxplot = anno_boxplot(
    mat,
    border = FALSE,
    gp = gpar(fill = "#CCCCCC"),
    pch = '.',
    size = unit(2, "mm"),
    axis = TRUE,
    axis_param = list(
      gp = gpar(fontsize = 12),
      side = 'left')),
  annotation_width = unit(c(2.0),"cm"),
  which = "col")

boxplotRow <- HeatmapAnnotation(
  boxplot = row_anno_boxplot(
    mat,
    border = FALSE,
    gp = gpar(fill = '#CCCCCC'),
    pch = '.',
    size = unit(2, 'mm'),
    axis = TRUE,
    axis_param = list(
      gp = gpar(fontsize = 12),
      side = 'top')),
  annotation_width = unit(c(2.0), 'cm'),
  which = 'row')

##create annotation: gene labels
#retain every 4th successive label.
genelabels <- rowAnnotation(
  Genes = anno_mark(
    at = seq(1, nrow(mat), 4),
    labels = rownames(mat)[seq(1, nrow(mat), 4)],
    labels_gp = gpar(fontsize = 10, fontface = "bold"),
    padding = 0.75),
  width = unit(2.0, "cm") +
    max_text_width(
      rownames(mat)[seq(1, nrow(mat), 10)],
      gp = gpar(fontsize = 10,  fontface = "bold")))

#####################

##perform partitioning around medoids (PAM) to identify clusters in the data
pamClusters <- cluster::pam(x, k = 2) # pre-select k = 2 centers
pamClusters$clustering <- paste0('Cluster ', pamClusters$clustering)

# fix order of the clusters to have 1 to 4, top to bottom
pamClusters$clustering <- factor(pamClusters$clustering,
                                 levels = c('Cluster 1', 'Cluster 2'))



##create the actual heatmap object
hmap <- Heatmap(mat,
                # split the genes / rows according to the PAM clusters
                #split = pamClusters$clustering,
                cluster_row_slices = TRUE,
                name = 'Gene\nZ-\nscore',
                col = colorRamp2(myBreaks, myCol),
                # parameters for the colour-bar that represents gradient of expression
                heatmap_legend_param = list(
                  color_bar = 'continuous',
                  legend_direction = 'vertical',
                  legend_width = unit(8, 'cm'),
                  legend_height = unit(5.0, 'cm'),
                  title_position = 'topcenter',
                  title_gp=gpar(fontsize = 12, fontface = 'bold'),
                  labels_gp=gpar(fontsize = 12, fontface = 'bold')),

                # row (gene) parameters
                cluster_rows = TRUE,
                show_row_dend = TRUE,
                #row_title = 'Statistically significant genes',
                row_title_side = 'left',
                row_title_gp = gpar(fontsize = 12,  fontface = 'bold'),
                row_title_rot = 90,
                show_row_names = FALSE,
                row_names_gp = gpar(fontsize = 10, fontface = 'bold'),
                row_names_side = 'left',
                row_dend_width = unit(25,'mm'),

                # column (sample) parameters
                cluster_columns = FALSE,
                show_column_dend = FALSE,
                column_title = '',
                column_title_side = 'bottom',
                column_title_gp = gpar(fontsize = 12, fontface = 'bold'),
                column_title_rot = 0,
                show_column_names = FALSE,
                column_names_gp = gpar(fontsize = 10, fontface = 'bold'),
                column_names_max_height = unit(10, 'cm'),
                column_dend_height = unit(25,'mm'),

                # cluster methods for rows and columns
                clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
                clustering_method_columns = 'ward.D2',
                clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
                clustering_method_rows = 'ward.D2',

                # specify top and bottom annotations
                top_annotation = colAnn)
#bottom_annotation = boxplotCol)

# draw the heatmap
draw(hmap + genelabels,
     heatmap_legend_side = 'left',
     annotation_legend_side = 'right',
     row_sub_title_side = "left")

# extra: change colour scheme, breaks, and do extra clustering on columns
myCol <- colorRampPalette(c('royalblue', 'white', 'red3'))(100)
##myBreaks <- seq(-1.5, 1.5, length.out = 100)

hmap1 <- Heatmap(mat,

                 name = 'Gene Z-score',

                 col = colorRamp2(myBreaks,myCol),

                 heatmap_legend_param = list(
                   color_bar = 'continuous',
                   legend_direction = 'horizontal',
                   legend_width = unit(8, 'cm'),
                   legend_height = unit(5.0, 'cm'),
                   title_position = 'topcenter',
                   title_gp=gpar(fontsize = 30, fontface = 'bold'),
                   labels_gp=gpar(fontsize = 24, fontface = 'bold')),

                 cluster_rows = TRUE,
                 show_row_dend = TRUE,
                 row_title = 'Statistically significant genes',
                 row_title_side = 'left',
                 row_title_gp = gpar(fontsize = 30,  fontface = 'bold'),
                 row_title_rot = 90,
                 show_row_names = FALSE,
                 row_names_gp = gpar(fontsize = 11, fontface = 'bold'),
                 row_names_side = 'left',
                 row_dend_width = unit(25,'mm'),

                 cluster_columns = TRUE,
                 show_column_dend = FALSE,
                 column_title = 'Samples',
                 column_title_side = 'bottom',
                 column_title_gp = gpar(fontsize = 30, fontface = 'bold'),
                 column_title_rot = 0,
                 show_column_names = FALSE,
                 column_names_gp = gpar(fontsize = 8, fontface = 'bold'),
                 column_names_max_height = unit(10, 'cm'),
                 column_dend_height = unit(25,'mm'),

                 clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
                 clustering_method_columns = 'ward.D2',
                 clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
                 clustering_method_rows = 'ward.D2')



pushViewport(viewport(layout = grid.layout(nr = 1, nc = 2)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
draw(hmap1,
     heatmap_legend_side = 'top',
     row_sub_title_side = 'left',
     newpage = FALSE)
popViewport()

pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
popViewport()

rna-seq • 2.4k views

ADD COMMENT • link 4.1 years ago by Rob ▴ 170

0

Entering edit mode

Hi @rhasanvandj, Is Ok for you to share the heatmap you have created?

ADD REPLY • link 4.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

yes, sure Here is the heatmap

https://ibb.co/znfWrCn

ADD REPLY • link 4.1 years ago by Rob ▴ 170

0

Entering edit mode

Yes, there is no pattern in your heatmap please reassure about your upstream analysis steps especially DE analysis. Also when your groups in comparison are significantly imbalanced in terms of sample size, this also would cause to have no distinctive pattern in the heatmap.

ADD REPLY • link 4.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Thanks Hamid I repeated DE a few times and the results are fine. What other analysis I need to do before plotting the heatmap? (you mentioned as downstream analysis steps)

ADD REPLY • link 4.1 years ago by Rob ▴ 170

0

Entering edit mode

oops, I meant upstream steps like DE and data transformation. Edited the comments. Did you repeat DE with same pipeline or tried with different pipelines? What about your sample size in each group? It would be also helpful if you label the heatmap rows and columns. Also add dendrogram to the heatmap column.

ADD REPLY • link 4.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

I tried with different pipelines: RSEM data, count data, and this one is with HT-seq raw count data. I used vst() from DEeq2 to normalize and log2 transform HT-Seq raw count data, then I changed them to z-score (scale by row). I plotted heatmaps with raw data, normalized log2transformed data, z-scores. I did not get pattern. then, I winscaled the z-scores (change the values smaller than -3 to -3 and changed the values larger than 3 to 3) to avoid effects of very highly over/under expressed data overcoming the pattern. The image I shared is from winscaled data and the best one. I have different analysis: one is with 44 samples (22 control: 22 treatment), 132 samples(66 vs 66), 88 samples (44 vs 44). I dont get good pattern for any of the analyses.

I did not add dendrogram to the column because I do not want to cluster my column. I want them based on the order I sorted in my file.

ADD REPLY • link 4.1 years ago by Rob ▴ 170

1

Entering edit mode

I stopped to comment here because your workflow is not clear to me. Indeed I failed to understand what you did.

I plotted heatmaps with raw data

At least for this part of the workflow, you can find several recommendations against doing so.

Maybe other can provide some hints for you.

ADD REPLY • link 4.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

yes you are right. It was just a trial. One thing that I want to do is clustering inside each of two groups. How can I do clustering for each group separately?

ADD REPLY • link 4.1 years ago by Rob ▴ 170

1

Entering edit mode

Came across this: https://www.nature.com/articles/nmeth.1902. You and others may find this helpful.

Hundreds of rows and columns can be displayed on a screen. Heat maps rely fundamentally on color encoding and on meaningful reordering of the rows and columns. When either of these components is compromised, the utility of the visualization suffers.