Using EnrichedHeatmap for visualization of NGS experiments

Tutorial:Using EnrichedHeatmap for visualization of NGS experiments

22

Entering edit mode

6.0 years ago

ATpoint 89k

This tutorial contains some example code to produce visually-appealing enrichment heatmaps using the R/Bioconductor package EnrichedHeatmap. (BioC, Publication). The package depends on its "parent" ComplexHeatmap from the same developer. It is therefore advisable to read the documentation (which is extensive) from both packages to get started with the underlying logic and concepts. The tutorial is in no way comprehensive towards the full functionality of the package, nor will it ever be. It is intended to provide some basic code to get started with, hoping to encourage users to dig deeper themselves. It may indeed require a bit of effort to get your head around the package(s) as they offer a large amount of options to customize your heatmaps, at least it did for me. Please go into the documentation and use google in case you need information. Most of the time the same question has already been asked before somewhere, be it on communities or via a Github issue.

Below is the heatmap we want to produce, starting from a BED file that contains peak regions (which are called targets in EnrichedHeatmap language) and a bigwig file that contains the read counts (called signals in EH language).

The required data for the example code can be downloaded from GSE111902. Go to the bottom of the page, click custom next to GSE111902_RAW.tar and then select the first two files which is GSM3045250_N_ATAC_Kaech_1.bw (a bigwig file with read counts) and GSM3045250_N_ATAC_Kaech_1_peaks.broadPeak.gz (a set of genomic regions / peaks from that experiment, here ATAC-seq).

Enriched Heatmap

R code for the heatmap:

	#/ A minimal example on how to use EnrichedHeatmap together with rtracklayer.
	#/ Required data are at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111902
	#/ Go for the files :
	#/ "GSM3045250_N_ATAC_Kaech_1_peaks.broadPeak.gz" and (genomic regions in BED-like format)
	#/ "GSM3045250_N_ATAC_Kaech_1.bw" (the bigwig files with read counts)

	require(EnrichedHeatmap)
	require(rtracklayer)
	require(circlize)
	require(data.table)

	#/ Load a BED file into R with data.table::fread(),
	#/ then convert to GRanges format with GenomicRanges::GRanges():
	targets <- makeGRangesFromDataFrame(
	df = fread("GSM3045250_N_ATAC_Kaech_1_peaks.broadPeak.gz", header = FALSE, data.table = FALSE),
	seqnames.field = "V1", start.field = "V2", end.field = "V3")

	#/ For this tutorial we take the first 10000 regions rather than the full file:
	targets <- head(targets, 10000)

	#/ We take the center of each region/peak and want to extend by 5kb each direction:
	ExtendSize <- 5000
	targets.extended <- resize(targets, fix = "center", width = ExtendSize*2)

	#/ We load the relevant parts of the bigwig file into R.
	#/ This is more efficient than reading the entire file.
	BigWig <- rtracklayer::import("GSM3045250_N_ATAC_Kaech_1.bw",
	format = "BigWig",
	selection = BigWigSelection(targets.extended))

	#/ Create the normalizedMatrix that EnrichedHeatmap accepts as input.
	#/ We use the targets center (width=1) because from what I understand normalizeMatrix
	#/ does not allow to turn off its "extend" option. Therefore we trick it by simply
	#/ providing the peak centers and then let the function extend it by our predefined window size.
	normMatrix <- normalizeToMatrix(signal = BigWig,
	target = resize(targets, fix = "center", width = 1),
	background = 0,
	keep = c(0, 0.99), #/ minimal value to the 99th percentile
	target_ratio = 0,
	mean_mode = "w0", #/ see ?EnrichedHeatmap on other options
	value_column = "score", #/ = the name of the 4th column of the bigwig
	extend = ExtendSize)

	#/ Make a color gradient that covers the range of normMatrix from 0 to the 99th percentile.
	#/ The percentile avoids outliers to skew the heatmap:
	col_fun = circlize::colorRamp2(quantile(normMatrix, c(0, .99)), c("darkblue", "darkgoldenrod1"))

	#/ heatmap function:
	EH <- EnrichedHeatmap( mat = normMatrix,
	pos_line = FALSE, #/ no dashed lines around the start
	border = FALSE, #/ no box around heatmap
	col = col_fun, #/ color gradients from above
	column_title = "Nice Heatmap", #/ column title
	column_title_gp = gpar(fontsize = 15, fontfamily = "sans"),
	use_raster = TRUE, raster_quality = 10, raster_device = "png",
	#/ turn off background colors
	rect_gp = gpar(col = "transparent"),
	#/ legend options:
	heatmap_legend_param = list(
	legend_direction = "horizontal",
	title = "normalized counts"),
	#/ options for the profile plot on top of the heatmap:
	top_annotation = HeatmapAnnotation(
	enriched = anno_enriched(
	gp = gpar(col = "black", lty = 1, lwd=2),
	col="black")
	)
	) #/ end of EnrichedHeatmap function

	#/ Save as pdf to disk:
	pdf("EnrichedHeatmap.pdf")

	draw(EH, #/ plot the heatmap from above
	heatmap_legend_side = "bottom", #/ we want the legend below the heatmap
	annotation_legend_side = "bottom", #/ legend on the bottom side
	padding = unit(c(4, 4, 4, 4), "mm") #/ some padding to avoid labels beyond plot borders
	)

	dev.off()

view raw EnrichedHeatmap_example.R hosted with ❤ by GitHub

heatmap chip-seq atac-seq EnrichedHeatmap • 9.1k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 6.0 years ago by ATpoint 89k

0

Entering edit mode

Hi, Thanks. Does V1, V2, and V3 indicates column-1, 2 and 3 respectively in the "GSM3045250_N_ATAC_Kaech_1_peaks.broadPeak.gz" file in the following code? Please let me know. targets <- makeGRangesFromDataFrame( df = fread("GSM3045250_N_ATAC_Kaech_1_peaks.broadPeak.gz", header = FALSE, data.table = FALSE), seqnames.field = "V1", start.field = "V2", end.field = "V3")

ADD REPLY • link 4.4 years ago by goodwinjinesh • 0

1

Entering edit mode

These are the default colnames that data.table::fread gives objects without headers (=colnames). On disk this is a plain text file without colnames. Does that answer your question?

ADD REPLY • link 4.4 years ago by ATpoint 89k

0

Entering edit mode

Thank you. Yes. I have one more question. Can I create a file only with first three columns (col-1: Chr1; col-2: Start; col-3: end) and explore a bigwig files using that grange?. The context is, I have a set of 50-100 gene coordinates and I want to explore only those targets in a bigwig file. Just to get a heatmap of those 50-100 genes alone. Please let me know. Thanks in advance.

ADD REPLY • link 4.3 years ago by goodwinjinesh • 0

1

Entering edit mode

I guess so. You can subset the targets file to whatever you want. If you have these gene coordinates as GRanges then probably subsetByOverlaps on the target file will do the job.

ADD REPLY • link 4.3 years ago by ATpoint 89k

Login before adding your answer.