I have RNA-seq data (FPKMs) from Cufflinks and would like to cluster it by gene and produce a heatmap.
This is my first try at using R and I have spent a LOT of time pouring over the manual/help pages and internet tutorials on how to do this.
I can now produce heatmaps using "heatmap" easily enough, my problem is that I can produce them from many different versions/transformations of my data and I cannot figure out what is going on and which heatmap is the analysis I am interested in.
What I am trying to get is a) gene names clustered by expression profile, to mine for enriched gene groups/pathways; and b) a heatmap of FPKM values, with the same gene clustering.
This is the R code: Data input/preparation
m <- data.frame(read.table("DMSTSC1000_notmeanctrd.txt", header=T, sep="\t"))
row.names(m) <- m$test_id
m <- m[,2:7]
m_matrix <- data.matrix(m)
Making Heatmap version 1:
heatmap(m_matrix, Colv=NA, scale="column")
Making Heatmap version 2. This came about because a paper described using a Pearson correlation metric with clustering, but this heatmap looks terrible, clustering appears to bear little relationship with imaged data:
cor_t <- cor(t(m_matrix))
distancet <- as.dist(cor_t)
hclust_complete <- hclust(distancet, method = "complete")
dendcomplete <- as.dendrogram(hclust_complete)
heatmap(m_matrix, Rowv=dendcomplete, Colv=NA, scale="column")
Making Heatmap version 3
distancem <- dist(m_matrix)
hclust_completem <- hclust(distancem, method = "complete")
dendcompletem <- as.dendrogram(hclust_completem)
heatmap(m_matrix, Rowv=dendcompletem, Colv=NA, scale="column")
Or, if you have code for a fourth way that you're confident about, I'd love to hear it! I tried to use pam but haven't been able to produce a heatmap from it yet.
Sorry about not uploading images, I haven't figured out how to web-host them yet.
Details: FPKM data has been log2 transformed and high outliers were capped at a maximum value (10), to increase the range of colors used for the majority of the data.
Thank you in advance for your help, it is very much appreciated!!
Maybe you should change the title since, from what I understood, it seems your problem is more about choosing clustering methods than generating and analyzing heatmaps which you seem to know how to do.
True, will do, thanks!