Question

Heatmap And Cluster Analysis Of Surface Marker Expression?

1

Entering edit mode

12.0 years ago

jzvereb ▴ 20

Hi to all!

So I have got 29 surface markers measured by FACS in 26 samples (Score between 0 and 100). I would like to draw a heatmap like in gene expression data then a dendrogram (cluster analysis). In Statistica 7 the clusters were clean and the results was perfect, but the quality is not so good, and there is no heatmap. So I would like to do it in R Stat, but I have problem with the scripts. Can anybody help me please?

Thanks for your comments. Best regards, Zed

UPDATE: Here are my data (Markers are the surface markers, D1 to BD5 the donors):

Markers,D1,D2,D3,D4,D5,D6,D7,D8,D9,AD1,AD2,AD3,AD4,AD5,AD6,AD7,AD8,AD9,AD10,AD11,AD12,BD1,BD2,BD3,BD4,BD5 M1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,8 M2,99,100,99,99,100,100,100,99,94,80,97,97,86,96,98,99,93,93,96,93,91,99,98,88,99,100 M3,97,97,97,98,99,92,100,95,96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M4,15,1,0,8,0,3,0,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M5,44,15,14,55,10,44,15,59,60,72,4,4,4,59,35,11,72,72,17,2,21,7,27,2,36,19 M6,79,92,92,69,98,92,69,63,61,62,95,94,94,90,93,88,84,94,76,95,83,79,93,90,97,89 M7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M8,99,100,98,100,100,100,100,99,0,90,100,100,99,95,99,99,99,99,98,95,97,100,99,93,100,91 M9,80,93,94,96,87,87,88,88,90,61,0,94,78,84,86,96,94,93,84,88,97,98,96,88,97,92 M10,98,97,96,86,98,86,86,67,55,90,57,57,91,73,0,52,84,84,72,76,89,75,0,80,36,51 M11,0,0,0,0,7,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M12,39,43,21,29,6,29,11,68,57,1,79,79,79,0,0,0,50,50,0,0,5,50,37,12,0,0 M13,68,62,70,49,0,49,49,52,50,77,46,46,46,0,0,0,13,13,0,0,48,3,5,55,33,0 M14,0,0,0,0,6,0,0,69,67,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M15,98,100,99,99,100,100,100,96,92,75,87,99,92,86,95,100,92,94,98,91,94,98,98,99,100,99 M16,0,0,0,0,0,0,0,7,6,95,95,91,91,81,89,90,95,87,81,93,82,73,80,93,91,99 M17,97,83,96,84,99,99,99,75,1,79,97,0,0,0,0,93,70,0,0,0,0,0,0,85,87,0 M18,97,98,97,99,100,99,100,97,95,74,77,87,94,92,70,79,93,77,78,94,75,86,95,76,92,91 M19,30,0,69,57,0,69,57,23,32,63,71,71,0,52,20,99,37,0,0,0,0,0,0,0,100,0 M20,77,98,98,99,98,96,96,87,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M21,99,96,97,94,99,97,97,93,76,33,0,98,0,0,61,99,87,0,0,91,75,91,0,19,99,99 M22,98,99,90,98,100,100,100,97,95,69,95,95,92,88,96,84,84,84,52,50,72,0,0,43,0,0 M23,99,99,99,100,100,99,100,98,91,53,93,96,97,96,90,0,96,39,81,93,92,97,91,88,0,98 M24,98,99,99,99,100,100,100,96,98,27,100,100,99,98,99,97,98,98,78,96,95,98,97,86,99,100 M25,72,0,80,73,0,0,0,22,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M26,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M27,63,76,88,88,0,0,0,67,55,1,93,93,94,91,88,85,85,64,86,80,98,93,99,76,86,100 M28,2,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M29,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

UPDATE: Here is the script, i got it from somebody:

library(amap)
library(gplots)

draw.heatmap <- function(data, plot.name = "") {
png(paste("heatmap_", plot.name, ".png", sep = ""), width = 800, height = 1000)
pal <- colorpanel(100, "blue", "yellow", "red")
labCol <- c(expression("CDP BM"), expression("MO Bl"),expression("MF Sp"), expression("pDC Sp"),  
        expression("cDC liver"),expression("D1 Gr1+"), expression("D1 Gr1-"),expression("D2 Gr1+"), 
        expression("D2 Gr1-"), expression("D4 Gr1+"), expression("D4 Gr1-"), expression("D8 Gr1+"), 
        expression("D8 Gr1-"))
rownames(data) <- data[,1]
heatmap.2(apply(data[,-1], 1:2, as.numeric), Rowv = T, Colv = T,cexRow = 1.0, cexCol = 1.7,
    margins = c(8,8), dendrogram = "both", labCol = labCol, key = T, 
    distfun = function(matrix) as.dist(Dist(matrix, method = "spearman")),
    hclustfun = function(matrix) hclust(matrix, method = "complete"), 
    col = pal, density.info = "none", trace = "none",
    reorderfun = function(d,w) { d },
    scale = "row"
)
dev.off()
}

project.dir <- "/molbio/projects/DC_varga_tamas"
data <- read.csv("/molbio/projects/DC_varga_tamas/data/heatmap_cDC_corrected.txt", sep = "\t")

setwd(file.path(project.dir, "results"))
draw.heatmap(data[, c(-6,-8)], plot.name = "cDC_v6")

I got confused, because the script does not fit the data.

heatmap clustering • 5.4k views

ADD COMMENT • link updated 12.0 years ago by Istvan Albert 102k • written 12.0 years ago by jzvereb ▴ 20

0

Entering edit mode

Have you tried the command heatmap ?

ADD REPLY • link 12.0 years ago by Ben ★ 2.0k

0

Entering edit mode

yes of course, my problem there is something with the data. I am started to use R stat just few days ago. And in gene expression you don't have 100 as a score, (I changed my data to percentage, 1 means 100 and 0.1 means 10) but the outcome was not a heatmap.

ADD REPLY • link 12.0 years ago by jzvereb ▴ 20

0

Entering edit mode

If your problem is with the data, it's a good idea to post some of it. Showing us the code you've tried as well would be even better

ADD REPLY • link 12.0 years ago by Ben ★ 2.0k

0

Entering edit mode

It shouldn't matter whether your data is from 1 to 100 or 1 to 10.

ADD REPLY • link 12.0 years ago by Obi Griffith 20k

0

Entering edit mode

Your subsequent comments/responses should have both been edits to your original question not answers to your question. I cleaned it up. But, please keep in mind for future posts. :-)

ADD REPLY • link 12.0 years ago by Obi Griffith 20k

0

Entering edit mode

See the updated script in my answer. It imports your data from a csv file and then produces the heatmap shown below. I strongly suggest you take an intro R course. The sample script you already had should have been enough to get you started. But, you maybe need a little more understanding of R before you will be able to take an existing script, modify it for new situations, and read and understand documentation for the functions you are using.

ADD REPLY • link 12.0 years ago by Obi Griffith 20k

0

Entering edit mode

Thanks a lot. I am started to use R just 1.5 weeks ago, so I will try to get a short course, or go through a webinar. For me spss or simple statistics were enough. (I better do 100 western blots than play with the data.:))

I don't like use something if I don't understand so really thanks for your help!

Best regards z

ADD REPLY • link 12.0 years ago by jzvereb ▴ 20

1

Entering edit mode

You're welcome. I also started in the wet lab with SPSS and had almost zero formal computer science or statistics courses. Its a steep learning curve but don't be discouraged. It's worth a lot to be able to understand and analyze your own data. :-) If my answer helped solve your problem please upvote and "accept" the answer with check mark. Two-way participation is the fuel that keeps biostar moving. So, please keep coming back as you learn. Who knows, soon you might be answering someone else's analysis questions!

ADD REPLY • link 12.0 years ago by Obi Griffith 20k

1

Entering edit mode

Hi!

I changed a little bit the script and its working! So I am happy, because with your help, and with my try I know now somethings..:) Now I got green to red colours too.:) You know I sense the feeling when I was a kid and I got my forst computer a Conmmodore 64 and started to write my own stuff in basic.....

So thanks a lot again!!

ADD REPLY • link 12.0 years ago by jzvereb ▴ 20

0

Entering edit mode

please don't delete questions that you start once they get a lot of answers - it could be useful to other people later

ADD REPLY • link 12.0 years ago by Istvan Albert 102k

score 6 · Answer 1 · 2013-02-08

I suggest you try heatmap.2. Here is some sample code below using a fake dataset. If you can get your data into a matrix that looks like "data_matrix" below then the heatmap code should produce something sensible. You may want to customize it further and perhaps try different distance methods or clustering functions. If you want multiple color side bars you can see How Do I Draw A Heatmap In R With Both A Color Key And Multiple Color Side Bars?.

UPDATE: I have modified the script below to use the data you provided. I also changed it to use spearman as distance metric and complete linkage for clustering in case those details in your sample script were important to you. I usually don't rescale (by rows or columns) at least at first. Better to see how the "naked" data looks. And, I'm not sure what is being done with reorder function in your example. So, I left it out. If you don't like the "traditional" heat color scheme that I have used it is easy to change to another (like the blue/yellow/red in your sample script).

library("gplots")
library("amap")

#Set working directory - This should be the only line you need to change to get this script to work.
setwd("/your/output/dir")

#Read in data from csv file - in this case I have hosted in a public dropbox folder for convenience
data_matrix=read.csv(file=url("http://dl.dropbox.com/u/16769159/marker_data.csv"), row.names=1)
marker_names=rownames(data_matrix)
sample_ids=colnames(data_matrix)

#Define custom dist and hclust functions for use with heatmaps
#mydist=function(c) {dist(c,method="euclidian")}
mydist=distfun = function(matrix) as.dist(Dist(matrix, method = "spearman"))
#myclust=function(c) {hclust(c,method="average")}
myclust=function(c) {hclust(c,method="complete")}

#Create heatmap using heatmap.2 source code
pdf(file="heatmap_example.pdf")
main_title="heatmap of clustered markers"
par(cex.main=1)
heatmap.2(as.matrix(data_matrix), hclustfun=myclust, distfun=mydist, na.rm = TRUE, scale="none", dendrogram="both", margins=c(6,6), 
Rowv=TRUE, Colv=TRUE, symbreaks=FALSE, key=TRUE, symkey=FALSE, 
density.info="none", trace="none", main=main_title, labCol=sample_ids, labRow=marker_names, cexRow=1, col=rev(heat.colors(75)))
dev.off()

Heatmap image