I need to make a statistical comparison using breast cancer data. I have made a heat map at the following link on the Bioportal: http://www.cbioportal.org/index.do?cancer_study_id=brca_tcga&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&data_priority=0&case_set_id=brca_tcga_mrna&gene_list=BRCA1%250ABRCA2%250ATP53%250ACCL2%250ACCR3%250ACD44%250AENG%250AIL6%250AIL33%250ACD33%250ACSF1%250AHIF1A%250ACLEC7A&geneset_list=%20&tab_index=tab_visualize&Action=Submit&genetic_profile_ids_PROFILE_MRNA_EXPRESSION=brca_tcga_mrna_median_Zscores&show_samples=false&heatmap_track_groups=brca_tcga_mrna_median_Zscores%2CBRCA1%2CBRCA2%2CTP53%2CCCL2%2CCCR3%2CCD44%2CENG%2CIL6%2CIL33%2CCD33%2CCSF1%2CHIF1A%2CCLEC7A
To do this I initially went to the http://www.cbioportal.org page and selected the cancer samples that I am interested in: Breast Invasive Carcinoma (TCGA, Provisional)
I then went to the textbook to enter the names of the genes that I am interested in: BRCA1 BRCA2 TP53 CCL2 CCR3 CD44 ENG IL6 IL33 CD33 CSF1 HIF1A CLEC7A
Then I submitted the query and was able to plot a clustered heat map (by clicking 'Heatmap>>Add genes to heatmap>>cluster heatmap') in the "oncoprint" tab.
While this type of annotation is useful, I would also like to be able to download the data that is generating this heat map and make a similar heat map on my local
computer (for experimental reasons).
To attempt this I went to the original search page and clicked the "View summary" button.
From this I found a "Download Data" button at the top of the page.
This returns a 'tar.gz' file with lots of interesting datasets. e.g.:
data_mRNA_median_Zscores.txt
data_expression_median.txt
data_RNA_Seq_v2_mRNA_median_Zscores.txt
data_RNA_Seq_v2_expression_median.txt
I want to find the expression data that was used to generate the histogram shown in the first provided link. From the downloaded files, I initially
tried data_RNA_Seq_v2_expression_median.txt
Below is my attempt to reproduce a heatmap similar to the one above:
data=read.table('data_RNA_Seq_v2_expression_median.txt',header=T,fill = T,stringsAsFactors = F)
genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A")
data_OI=data.frame()
for(i in genes_OI$V1){
data_OI=rbind(data_OI,data[which(data[,1]==i),])
}
sumis.na(data_OI))
library(gplots)
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)
data=na.omit(data)
heatmap.2(as.matrix(data_OI[,-c(1,2)]),labCol = NA,
labRow = data_OI[,1],cexRow = 1.4,keysize = 1.4)
dev.off()
The resulting heatmpat is as follows:
But this is not at all like the heatmap in the link at the top of the page.... is there some normalisation step that I am missing?
I also tried using the file where the the Zscores were computed: data_RNA_Seq_v2_mRNA_median_Zscores.txt
This file however (just from looking at the file content does not require any distance matrix):
data=read.table('data_RNA_Seq_v2_mRNA_median_Zscores.txt',header=T,fill = T,stringsAsFactors = F)
genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A")
data_OI=data.frame()
for(i in genes_OI$V1){
data_OI=rbind(data_OI,data[which(data[,1]==i),])
}
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)
data=na.omit(data)
hclustfunc <- function(x, method = "complete", dmeth = "euclidean") {
hclust(dist(x, method = dmeth), method = method)
}
rc<-hclustfunc(data_OI[,-c(1,2)])
cd=t(data_OI[,-c(1,2)])
cc<-hclustfunc(cd)
heatmap(as.matrix(data_OI[,-c(1,2)]), Rowv=as.dendrogram(rc),
Colv=as.dendrogram(cc),labRow = data_OI[,1],labCol = NA)
dev.off()
This unfortunately does not produce anything similar to the heat map seen in the link above....
Hence I am wondering were it is that I am going wrong....? Am I using the correct file or is there
a normalisation step that I am missing?
The cBioPortal group only refers to the part at the bottom as the heatmap. The main figure is an 'oncoprint', with some form of a heatmap at the bottom.
Take a look here: OncoPrint
Hi sorry forgot to add step about "clicking 'Heatmap>>Add genes to heatmap>>cluster heatmap". The question has been modified necessarily.
I have posted an answer for you - see below
See: How to add images to a Biostars post to understand how to embed images in a post. I've made the necessary changes for now.
thanks this has been amended
What is the basis of heatmap clustering? From what factor depends the order of patients after selecting "cluster heatmap" in the oncoprint? Thank you.
The code contains the answer to your question. Check out
?hclust
and?heatmap.2
Thanks. So is it essentially hierarchical clustering to find the maximum number of similarities and cluster them together? Because in my case, I have clustered only 4 genes and perhaps it's quite expected to not observe any clustering.
If that's what the manual says, that's what it is. Do you have any questions that the manual does not cover?