Entering edit mode
4.1 years ago
Rob
▴
170
Hi I have HTseq data and want to plot heatmap for significant expressed genes. I followed Kevin Blighe tutorial :https://github.com/kevinblighe/E-MTAB-6141 I still do not have any pattern. These genes are significantly differentially expressed between two groups (tretment and control) and should have pattern in heatmap. Any body can help me with this? Also, default color range for this code is -4 to 4. How can I change it to for example -3 to 3? this is my code:
require(RColorBrewer)
require(ComplexHeatmap)
require(circlize)
require(digest)
require(cluster)
library(DESeq2)
##obtain the data
data <- read.table("mydata.txt", sep = '\t',
header = TRUE, stringsAsFactors = FALSE,check.names=FALSE, row.names = 1)
mat<- as.matrix (data)
####################normalize and log2transform
mat1 <- vst(mat) ##normalization and logtransformation
mat2<- t(scale(t(mat1))) ##zscore of normalized log2 transformed data
metadata <- read.table("samples.txt", sep = '\t', row.names = 1,
header = TRUE,check.names=FALSE,stringsAsFactors = FALSE)
sig_genes <- read.table("SigGenes_FDR0.05_FC2.txt", sep = '\t',
header = FALSE, stringsAsFactors = FALSE,check.names=FALSE)[,1]
##Check the md5 checksums to ensure data integrity / security. The checksums should be:
digest::digest(mat3, algo = 'md5')
digest::digest(metadata, algo = 'md5')
digest::digest(sig_genes, algo = 'md5')
# --check that both objects are aligned by name
all(rownames(metadata) == colnames(mat2))
# Subset the expression matrix for the statistically significant genes
mat <- mat2[sig_genes, ]
#set colour scheme and choose breaks
myCol <- colorRampPalette(c('dodgerblue', 'black', 'yellow'))(100)
myBreaks <- seq(-3, 4, length.out = 100)
##create annotation: treatment status and sex
#First, we will just generate some colour mappings for the metadata sex
# Sex
cd3 <- metadata$Sex
cd3 <- cd3[!is.na(cd3)] # remove missing values - we don't want to include these in the mapping
pick.col <- brewer.pal(9, 'Greens')
col.cd3 <- colorRampPalette(pick.col)(length(unique(cd3)))
unique(col.cd3)
ann <- data.frame(
treatmentStatus = metadata$treatmentStatus,
Sex = metadata$Sex,
stringsAsFactors = FALSE)
# create the colour mapping
# create the colour mapping
colours <- list(treatmentstatus = c('0' = 'blue', '1' = 'red'),
Sex = c('M' = "#F7FCF5", 'F' = '#C7E9C0'))
# now create the ComplexHeatmap annotation object
# as most of these parameters are self-explanatory, comments will only appear where needed
colAnn <- HeatmapAnnotation(
df = ann,
which = 'col', # 'col' (samples) or 'row' (gene) annotation?
na_col = 'white', # default colour for any NA values in the annotation data-frame, 'ann'
col = colours,
annotation_height = 0.6,
annotation_width = unit(1, 'cm'),
gap = unit(1, 'mm'),
annotation_legend_param = list(
treatmentStatus = list(
nrow = 4, # number of rows across which the legend will be arranged
title = 'treatmentStatus',
title_position = 'topcenter',
legend_direction = 'vertical',
title_gp = gpar(fontsize = 12, fontface = 'bold'),
labels_gp = gpar(fontsize = 12, fontface = 'bold')),
Sex = list(
nrow = 5,
title = 'Sex',
title_position = 'topcenter',
legend_direction = 'vertical',
title_gp = gpar(fontsize = 12, fontface = 'bold'),
labels_gp = gpar(fontsize = 12, fontface = 'bold')))
)
##create annotation: box-and-whisker plots
boxplotCol <- HeatmapAnnotation(
boxplot = anno_boxplot(
mat,
border = FALSE,
gp = gpar(fill = "#CCCCCC"),
pch = '.',
size = unit(2, "mm"),
axis = TRUE,
axis_param = list(
gp = gpar(fontsize = 12),
side = 'left')),
annotation_width = unit(c(2.0),"cm"),
which = "col")
boxplotRow <- HeatmapAnnotation(
boxplot = row_anno_boxplot(
mat,
border = FALSE,
gp = gpar(fill = '#CCCCCC'),
pch = '.',
size = unit(2, 'mm'),
axis = TRUE,
axis_param = list(
gp = gpar(fontsize = 12),
side = 'top')),
annotation_width = unit(c(2.0), 'cm'),
which = 'row')
##create annotation: gene labels
#retain every 4th successive label.
genelabels <- rowAnnotation(
Genes = anno_mark(
at = seq(1, nrow(mat), 4),
labels = rownames(mat)[seq(1, nrow(mat), 4)],
labels_gp = gpar(fontsize = 10, fontface = "bold"),
padding = 0.75),
width = unit(2.0, "cm") +
max_text_width(
rownames(mat)[seq(1, nrow(mat), 10)],
gp = gpar(fontsize = 10, fontface = "bold")))
#####################
##perform partitioning around medoids (PAM) to identify clusters in the data
pamClusters <- cluster::pam(x, k = 2) # pre-select k = 2 centers
pamClusters$clustering <- paste0('Cluster ', pamClusters$clustering)
# fix order of the clusters to have 1 to 4, top to bottom
pamClusters$clustering <- factor(pamClusters$clustering,
levels = c('Cluster 1', 'Cluster 2'))
##create the actual heatmap object
hmap <- Heatmap(mat,
# split the genes / rows according to the PAM clusters
#split = pamClusters$clustering,
cluster_row_slices = TRUE,
name = 'Gene\nZ-\nscore',
col = colorRamp2(myBreaks, myCol),
# parameters for the colour-bar that represents gradient of expression
heatmap_legend_param = list(
color_bar = 'continuous',
legend_direction = 'vertical',
legend_width = unit(8, 'cm'),
legend_height = unit(5.0, 'cm'),
title_position = 'topcenter',
title_gp=gpar(fontsize = 12, fontface = 'bold'),
labels_gp=gpar(fontsize = 12, fontface = 'bold')),
# row (gene) parameters
cluster_rows = TRUE,
show_row_dend = TRUE,
#row_title = 'Statistically significant genes',
row_title_side = 'left',
row_title_gp = gpar(fontsize = 12, fontface = 'bold'),
row_title_rot = 90,
show_row_names = FALSE,
row_names_gp = gpar(fontsize = 10, fontface = 'bold'),
row_names_side = 'left',
row_dend_width = unit(25,'mm'),
# column (sample) parameters
cluster_columns = FALSE,
show_column_dend = FALSE,
column_title = '',
column_title_side = 'bottom',
column_title_gp = gpar(fontsize = 12, fontface = 'bold'),
column_title_rot = 0,
show_column_names = FALSE,
column_names_gp = gpar(fontsize = 10, fontface = 'bold'),
column_names_max_height = unit(10, 'cm'),
column_dend_height = unit(25,'mm'),
# cluster methods for rows and columns
clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
clustering_method_columns = 'ward.D2',
clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
clustering_method_rows = 'ward.D2',
# specify top and bottom annotations
top_annotation = colAnn)
#bottom_annotation = boxplotCol)
# draw the heatmap
draw(hmap + genelabels,
heatmap_legend_side = 'left',
annotation_legend_side = 'right',
row_sub_title_side = "left")
# extra: change colour scheme, breaks, and do extra clustering on columns
myCol <- colorRampPalette(c('royalblue', 'white', 'red3'))(100)
##myBreaks <- seq(-1.5, 1.5, length.out = 100)
hmap1 <- Heatmap(mat,
name = 'Gene Z-score',
col = colorRamp2(myBreaks,myCol),
heatmap_legend_param = list(
color_bar = 'continuous',
legend_direction = 'horizontal',
legend_width = unit(8, 'cm'),
legend_height = unit(5.0, 'cm'),
title_position = 'topcenter',
title_gp=gpar(fontsize = 30, fontface = 'bold'),
labels_gp=gpar(fontsize = 24, fontface = 'bold')),
cluster_rows = TRUE,
show_row_dend = TRUE,
row_title = 'Statistically significant genes',
row_title_side = 'left',
row_title_gp = gpar(fontsize = 30, fontface = 'bold'),
row_title_rot = 90,
show_row_names = FALSE,
row_names_gp = gpar(fontsize = 11, fontface = 'bold'),
row_names_side = 'left',
row_dend_width = unit(25,'mm'),
cluster_columns = TRUE,
show_column_dend = FALSE,
column_title = 'Samples',
column_title_side = 'bottom',
column_title_gp = gpar(fontsize = 30, fontface = 'bold'),
column_title_rot = 0,
show_column_names = FALSE,
column_names_gp = gpar(fontsize = 8, fontface = 'bold'),
column_names_max_height = unit(10, 'cm'),
column_dend_height = unit(25,'mm'),
clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
clustering_method_columns = 'ward.D2',
clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
clustering_method_rows = 'ward.D2')
pushViewport(viewport(layout = grid.layout(nr = 1, nc = 2)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
draw(hmap1,
heatmap_legend_side = 'top',
row_sub_title_side = 'left',
newpage = FALSE)
popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
popViewport()
Hi @rhasanvandj, Is Ok for you to share the heatmap you have created?
yes, sure Here is the heatmap
Yes, there is no pattern in your heatmap please reassure about your upstream analysis steps especially DE analysis. Also when your groups in comparison are significantly imbalanced in terms of sample size, this also would cause to have no distinctive pattern in the heatmap.
Thanks Hamid I repeated DE a few times and the results are fine. What other analysis I need to do before plotting the heatmap? (you mentioned as downstream analysis steps)
oops, I meant upstream steps like DE and data transformation. Edited the comments. Did you repeat DE with same pipeline or tried with different pipelines? What about your sample size in each group? It would be also helpful if you label the heatmap rows and columns. Also add dendrogram to the heatmap column.
I tried with different pipelines: RSEM data, count data, and this one is with HT-seq raw count data. I used
vst()
from DEeq2 to normalize and log2 transform HT-Seq raw count data, then I changed them to z-score (scale by row). I plotted heatmaps with raw data, normalized log2transformed data, z-scores. I did not get pattern. then, I winscaled the z-scores (change the values smaller than -3 to -3 and changed the values larger than 3 to 3) to avoid effects of very highly over/under expressed data overcoming the pattern. The image I shared is from winscaled data and the best one. I have different analysis: one is with 44 samples (22 control: 22 treatment), 132 samples(66 vs 66), 88 samples (44 vs 44). I dont get good pattern for any of the analyses.I did not add dendrogram to the column because I do not want to cluster my column. I want them based on the order I sorted in my file.
I stopped to comment here because your workflow is not clear to me. Indeed I failed to understand what you did.
At least for this part of the workflow, you can find several recommendations against doing so.
Maybe other can provide some hints for you.
yes you are right. It was just a trial. One thing that I want to do is clustering inside each of two groups. How can I do clustering for each group separately?
Came across this: https://www.nature.com/articles/nmeth.1902. You and others may find this helpful.
Thanks Hamid for sharing the paper