Heatmap Column Side Color question in R
1
0
Entering edit mode
9.0 years ago
rockquark ▴ 10

Hi All,

I'm a beginner user in R, so this is probably an easy question for experienced R users to help me with. I appreciate any help that anyone can give.

So I have made a heatmap from microarray data which contains 1106 patient samples. Within these samples I have information on sex (male or female), age (old or young), and stage of cancer (stage I, stage IA, stage IB stage II, stage IIA, stage IIB, stage III, stage IIIA, stage IIIB, or stage IV). I have no problem making the heatmap, but I'm having trouble figuring out how to make a column side color bar which will distinguish these different populations in the patient data. Ultimately I would like to have a stacked color bar for the column which will show the following: 1. first column being color coded for female or male, 2. second column being color coded for young or old, and 3. last column being color coded for the stage of cancer.

In my data set I have changed the column heading to the following format: (sex)_(age)_(stage). So a specific example would be the following: F_Y_IB (indicating Female_Young_stage IB).

Would someone be able to help me out in figuring out the best way to do this in R?

Thank you in advance for any help that anyone can give me.

Afshin

R • 7.7k views
ADD COMMENT
1
Entering edit mode
9.0 years ago
5utr ▴ 370

I don't know which heatmap function are you using in R but this is an example with the function 'pheatmap'

require(pheatmap)
# dummy data
dummymat = matrix(rnorm(100), 10, 10)
colnames(dummymat) = paste("Patient", 1:10, sep = "")
rownames(dummymat) = paste("Gene", 1:10, sep = "")
# create a data frame with the patients categories
categories <- data.frame(Sex = factor(sample(c("Male", "Female"),size = 10,replace = T), labels = c("Male", "Female")),
                         Stage= factor(sample(c('I','II','III'),size = 10,replace = T), labels = c('I','II','III')))
rownames(categories) <- colnames(dummymat) 

pheatmap(dummymat, annotation = categories)
ADD COMMENT
0
Entering edit mode

Thank you for your comment! I'm using the heatmap.2 function. Would you be able to help me out with incorporating this in a heatmap.2 function. I really appreciate your help. I'm pretty new with R, so anything helps!

ADD REPLY
0
Entering edit mode

Actually I went ahead and worked with pheatmap function and this worked really well!!

I really appreciate your help Gian! Thanks again for your response!!

ADD REPLY
1
Entering edit mode

Glad it helped Afshin. Please if this solves your question please consider accepting the answer.

ADD REPLY
0
Entering edit mode

I actually have one more question for you Gian. Now that I have gotten some really nice labels and column colors, I'm trying to figure out why the color in the heatmap from pheatmap is different from when I plotted the heatmap using heatmap.2 function. Here are the two image comparisons:

using heatmap.2 function

Using pheatmap function

First image is using the heatmap.2 function and the second image is using pheatmap function.

Here is the R code for the first image:

heatmap.2(x, dendrogram ='both', Colv=TRUE, col=bluered(800), key=FALSE, keysize=1.0, 
          symkey=FALSE, density.info='none', trace='none', colsep=0:0, sepcolor='white', 
          sepwidth=0.05, scale="none",cexRow=1,cexCol=0.1, labCol = colnames(expr_mat), 
          hclustfun=function(c){hclust(c, method='mcquitty')}, lmat=rbind( c(0, 3), c(2,1), c(0,4) ), 
          lhei=c(1, 4, 0.25),margins=c(10,10))

Here is the R code for the pheatmap method:

pheatmap(x, clustering_method = "mcquitty", color = bluered(800), 
         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",
         scale = 'none', annotation = categories, show_colnames = F, cluster_rows = T,
         cluster_cols = T)

Would be able to let me know how I can get the same color scheme for the pheatmap image as I did for the heatmap.2 image?

Thanks!
Afshin

ADD REPLY
0
Entering edit mode

Please always use a reproducible example so other people can run it and help. Try to use the dummymat I posted instead of x and colnames(expr_mat)

ADD REPLY
0
Entering edit mode

ok sorry about that.

Here is the version with the dummymat data for the two methods:

require(pheatmap)
# dummy data
dummymat = matrix(rnorm(100), 10, 10)
colnames(dummymat) = paste("Patient", 1:10, sep = "")
rownames(dummymat) = paste("Gene", 1:10, sep = "")
# create a data frame with the patients categories
categories <- data.frame(Sex = factor(sample(c("Male", "Female"),size = 10,replace = T), labels = c("Male", "Female")),
                         Stage= factor(sample(c('I','II','III'),size = 10,replace = T), labels = c('I','II','III')))
rownames(categories) <- colnames(dummymat) 

pheatmap(dummymat, color = bluered(800), clustering_method = "mcquitty",
         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",
         scale = 'none', annotation = categories)

Which produces the following image: https://www.dropbox.com/s/vovo389unfxl3ne/pheatmap%20example%20data.png?dl=0

Here is with heatmap.2 function:

library(gplots)

# dummy data

dummymat = matrix(rnorm(100), 10, 10)
colnames(dummymat) = paste("Patient", 1:10, sep = "")
rownames(dummymat) = paste("Gene", 1:10, sep = "")

heatmap.2(dummymat, dendrogram ='both', Colv=TRUE, col=bluered(800), key=FALSE, keysize=1.0, 
          symkey=FALSE, density.info='none', trace='none', colsep=0:0, sepcolor='white', 
          sepwidth=0.05, scale="none",cexRow=1,cexCol=1, labCol = colnames(dummymat), 
          hclustfun=function(c){hclust(c, method='mcquitty')}, lmat=rbind( c(0, 3), c(2,1), c(0,4) ), 
          lhei=c(1, 4, 0.25),margins=c(10,10))

which produces the following image: https://www.dropbox.com/s/j3q7gakljr0qcgq/heatmap2%20example%20data.png?dl=0

These two different methods produce different colors for the heatmap.

Let me know what you think might solve this problem. I would like to use the pheatmap function, but with the same color scheme as the heatmap.2 function.

Thanks!

Afshin

ADD REPLY
1
Entering edit mode

The colors are different because pheatmap doesn't create symmetric breaks, to achieve that you can specify the breaks (last line):

pheatmap(dummymat, color = bluered(800), clustering_method = "mcquitty",
         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",
         scale = 'none', annotation = categories, 
         breaks=seq(-max(abs(dummymat)),max(abs(dummymat)), length.out=800))
ADD REPLY
0
Entering edit mode

Thanks that worked great! I really appreciate your help! Thanks again!

ADD REPLY
0
Entering edit mode

I actually have another question dealing with the issue above.

The example with the dummy data is great for showing how to get a color bar above the heatmap.

But how do I specifically match the colors to the heading of my original data?

For example my data header is formatted as "F_O_I".

With the example above I noticed that it does not specifically find my headers which contain "F" and make sure that it associates this with "Females", or specifically associate "O" with "Old", etc... How do I make sure that I can get each color to actually associate with the information in the header of my data set?

Any help will be much appreciated.

Thanks

ADD REPLY
1
Entering edit mode

Hi Afshin,

The idea here is to have two different data frames, one with your heatmap data(ex expression data) and the other with the annotation for the patients. Instead of changing the header of your 'dummymat' you should add a column in 'categories' for the characteristic that you want to plot.

So create the data frame 'categories' starting with your patients names and add the characteristics. This is an example of how you could add the Sex category:

# Add F_ or M_ to patients names to simulate your data
colnames(dummymat)=paste(colnames(dummymat),sample(c('F_','M_'),size = length(colnames(dummymat)),replace=T))
# create an empty data.frame with patients name on the rows
categories= data.frame(row.names = colnames(dummymat))
# use grepl to assign Sex based on the suffix F_ or M_
categories$Sex=ifelse(grepl('F_',rownames(categories)),'Female','Male')
ADD REPLY
0
Entering edit mode

Hi Gian,

Thanks!! That works great when for distinguishing between two groups. If I now want to distinguish between 3 or more groups how would I do that? For example, in the example above "I" is one of 4 groups representing stage. So in my header label I have samples that are labeled by "F_O_I", or "F_O_II" or "F_O_III" or "F_O_IV".

So I can't use the ifelse function. What do you recommend instead for this?? I really appreciate your help and sorry for the simple questions. I'm actually learning a lot from your help.

Thanks again!

Afshin

ADD REPLY
1
Entering edit mode

Hi Gian,

Nevermind... I just figured it out with the following code:

categories$CellType=ifelse(grepl('L540',rownames(categories)),'L540',
                       ifelse(grepl('SUDHL6',rownames(categories)),'SUDHL6',
                       ifelse(grepl('OCLY3',rownames(categories)),'OCILy3','Hut78')))

You might have a cleaner suggestion than this...

Thanks again for all your help!

Afshin

ADD REPLY

Login before adding your answer.

Traffic: 2731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6