From what I can find in papers, heatmaps using RNA seq data are created in several ways: using log-fold changes, z-scores, etc.
The edgeR vignette states:
Inputing RNA-seq counts to clustering or heatmap routines designed for microarray data is not straight-forward, and the best way to do this is still a matter of research. To draw a heatmap of individual RNA-seq samples, we suggest using moderated log-counts-per-million. This can be calculated by cpm with positive values for prior.count, for example :
> logcpm <- cpm(y, log=TRUE)
Just out of curiosity, I was wondering, how would it differ from calculating z-scores using the fitted.values (derived from the glmQLFit step) in the RNA seq analysis pipeline. Would the heat maps created using z-scores calculated from fitted.values turn out all that different?
Much appreciated, thank you.
As a continuation, would it be erroneous to average the log2 CPM values of replicates (after I have ascertained that there is indeed greater difference between samples rather than replicates)?
If you want to plot group values rather than individual sample values, then use
cpmByGroup
. There is never a need to average logCPM values yourself.