Question

R code to plot heatmap/average line plots from Deeptools computeMatrix

2

Entering edit mode

7.1 years ago

94133 ▴ 30

Deeptools is an incredible package, but I want more plotting versatility.

Anyone kind enough to share workflow/code for plotting computeMatrix heatmaps and average plots in R (ggplot, pheatmap etc.)?

Other suggestions for plotting from .bw coverage files and .bed feature files?

Thanks a lot!!!

ChIP-Seq sequencing R • 10k views

ADD COMMENT • link 7.1 years ago by 94133 ▴ 30

1

Entering edit mode

I think deeptools have pretty nice adjustments to generate very good heatmaps/profile plots. What do you think is missing?

Check this: Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)

ADD REPLY • link 7.1 years ago by venu 7.1k

0

Entering edit mode

I agree that the plots are nice and an easy way to visualize the data, but don't like the aesthetic for publication quality figures. I think there are way more attractive plots being published using other packages and want to go that route (i.e. matplotlib, pheatmap, ggplot etc).

ADD REPLY • link 7.1 years ago by 94133 ▴ 30

0

Entering edit mode

In regards to heatmaps I can assure you that it will require a lot of fiddling around to achieve similarly good-looking results in R (which is something you've realized already, hence the question, I assume). EDIT: I have never had enough patience so far to go further than what Devon has supplied. And I much prefer R for graphics.

The aspects that I tend to want to change about the deepTools heatmaps are mostly concerning the labels. Those can more easily be changed by going the python route (i.e., using the original source code and changing the aspects you feel need change).

ADD REPLY • link 7.1 years ago by Friederike 9.0k

0

Entering edit mode

You are correct ;)

Anyone have a better solution?

ADD REPLY • link 7.1 years ago by 94133 ▴ 30

0

Entering edit mode

@Kevin has some nice links here: How can I generate Heat Map with dendograms, and PCA analysis in "R Programming" for this type of data?

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

have you looked through the links venu provided?

ADD REPLY • link 7.1 years ago by Friederike 9.0k

0

Entering edit mode

FYI, matplotlib is exactly what's used to create plots by deepTools.

ADD REPLY • link 7.1 years ago by Devon Ryan 105k

0

Entering edit mode

can you specify which aspects you don't find attractive enough?

ADD REPLY • link 7.1 years ago by Friederike 9.0k

1

Entering edit mode

Thanks so much for your help everyone! I'll definitely start by tweaking the Python code in deepTools and dig into the R code when I have time.

ADD REPLY • link 7.1 years ago by 94133 ▴ 30

score 6 · Accepted Answer · 2018-06-06

I'm not sure it'd make sense to tidy a matrix of that size for use with ggplot2. Below is how to read a simple file produced by computeMatrix in R (the example file I'm using below can be found in the deepTools source code under deeptools/test/test_data/):

m = read.delim("computeMatrixOperations.mat.gz", skip=2, header=F)
m = as.matrix(m[,-c(1:6)])

The resulting matrix is exactly what's plotted in plotHeatmap, though you can also use pheatmap() if you really want:

pheatmap(m, cluster_rows=F, cluster_cols=F)  # This is a useless heatmap, I just use it for testing

Note that there are likes NAs in the data, which cause a lot of things to have issues during clustering.

One big caveat to all of this is that you don't see delineations by region groups and samples. That information is stored in the header:

> h = scan("computeMatrixOperations.mat.gz", n=1, sep="\n", what=character())
Read 1 item
> h
[1] "@{\"verbose\":true,\"scale\":1,\"skip zeros\":false,\"nan after end\":false,\"sort using\":\"mean\",\"unscaled 5 prime\":0,\"body\":1000,\"sample_labels\":[\"SRR648667.forward\",\"SRR648668.forward\",\"SRR648669.forward\",\"SRR648670.forward\",\"SRR648667.reverse\",\"SRR648668.reverse\",\"SRR648669.reverse\",\"SRR648670.reverse\"],\"downstream\":0,\"unscaled 3 prime\":0,\"group_labels\":[\"genes\"],\"bin size\":10,\"upstream\":0,\"group_boundaries\":[0,196],\"sample_boundaries\":[0,100,200,300,400,500,600,700,800],\"missing data as zero\":false,\"ref point\":null,\"min threshold\":null,\"sort regions\":\"no\",\"proc number\":20,\"bin avg type\":\"mean\",\"max threshold\":null}"

That's a json string, which I don't think base R has a function to parse. The important part is that the bit after sample_boundaries denotes the beginning and end of each sample (the labels are in sample_labels) and group_boundaries/group_labels does the same for groups of regions. So you can subset matrices in the normal way in R accordingly.

For computing the equivalent of plotProfile's output, you can simply apply(m, 2, function(x) mean(x, na.rm=T) (be sure to handle NA values!).

I usually prefer creating plots in R, but if it'd be a good bit of work to just reproduce a moderately complicated equivalent to plotHeatmap in R, so I would encourage you to take the advice from Friederike and tweak the python code in deepTools instead.