Yes, you can begin with FPKM values but you will have to transform these values and also filter the dataset with your 800 differentially expressed genes. On that note, 800 is a lot of genes. Try to increase your cut-offs for statistically significantly differentially expressed. Try things like:
- FDR Q<0.05
- FDR Q<0.01
- FDR Q<0.001
- FDR Q<0.0001
et cetera.
Use this code (below).
- Your FPKM values will be stored in MyFPKMValues
- DiffExpressedGenes will comprise a single vector of genes that are differentially expressed
- zFPKM package will be used to convert your FPKM values to the
Z-scale prior to clustering.
---------------------------
Set colour and heatmap scaling breaks
require("RColorBrewer")
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=101)
Scale the FPKM values to the Z scale
library(zFPKM)
heat <- zFPKM(MyFPKMValues)
Filter your dataset to include your differentially expressed genes:
heat <- heat[which(rownames(heat) %in% DiffExpressedGenes), ]
Generate heatmaps with Euclidean distance (first) and '1 - Pearson correlation' distance (second) (both use Ward's linkage)
require("gplots")
#Euclidean distance
heatmap.2(heat,
col=myCol,
breaks=myBreaks,
main="Title",
key=T, keysize=1.0,
scale="none",
density.info="none",
reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
trace="none",
cexRow=0.2,
cexCol=0.8,
distfun=function(x) dist(x, method="euclidean"),
hclustfun=function(x) hclust(x, method="ward.D2"))
#1-cor distance
heatmap.2(heat,
col=myCol,
breaks=myBreaks,
main="Title",
key=T, keysize=1.0,
scale="none",
density.info="none",
reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
trace="none",
cexRow=0.2,
cexCol=0.8,
distfun=function(x) as.dist(1-cor(t(x))),
hclustfun=function(x) hclust(x, method="ward.D2"))
To each heatmap command, you can add ColSideColors, which is a vector of colours for a condition of interest, such as case/controls. The order of this colour vector has to match the order of samples in your 'heat' object that you pass to heatmap.2
Note that converting to the Z scale is not exclusive: These guys median-centered their FPKM data and then log (base 2) transformed them prior to heatmap generation
An update (6th October 2018):
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units