Modify `cexRow` and change the dimensions of the heatmap

Question

How to cluster the upregulated and downregulated genes in heatmap?

4

Entering edit mode

7.2 years ago

bioinforesearchquestions ▴ 370

How to cluster the upregulated and downregulated genes in heatmap?

Initial heatmap:

Expected heatmap

RNA-SEQ heatmap • 6.7k views

ADD COMMENT • link updated 7.2 years ago by Kevin Blighe 88k • written 7.2 years ago by bioinforesearchquestions ▴ 370

score 5 · Answer 1 · 2017-09-25

5

Entering edit mode

7.2 years ago

Kevin Blighe 88k

You can try messing around with different combinations of the distance, linkage, and re-order functions. With the heatmap.2 function (assuming that you're using heatmap.2), you can specify the following as parameters:

#Re-order rows/columns by mean, use 1-Pearson's correlation distance, and complete linkage
heatmap.2(...,
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  distfun=function(x) as.dist(1-cor(t(x))),
  hclustfun=function(x) hclust(x, method="complete"))

#Re-order rows/columns by mean, use Euclidean distance, and Ward's linkage
heatmap.2(...,
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"))

Various other combinations exist, such as Manhattan and Canberra distance, coupled with simple or average linkage

Also experiment with setting your own breaks for heatmap shading, and scaling the data yourself to Z-scores (or other values)

myBreaks <- seq(-3, 3, length.out=101)
heat <- t(scale(t(MyDataMatrix)))
heatmap.2(..., breaks=myBreaks, scale="none")

If none of that works, as a last resort, you can order the rows yourself in whatever way you want, and then you 'fix' these in place by switching off the row dendrogram, but in this way you lose the dendrogram. Take a look at the parameters Rowv and dendrogram to see how you can do this. See here: https://www.rdocumentation.org/packages/gplots/versions/3.0.1/topics/heatmap.2

ADD COMMENT • link 6.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks, Kevin. Sure, I will try them.

ADD REPLY • link 7.2 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Great - let me know how it goes!

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

After incorpating the "1-Pearson's correlation distance",

How do people generally show significant genes in heatmap more than 100. I have 620 significant genes (q-value <=0.05)

ADD REPLY • link 7.2 years ago by bioinforesearchquestions ▴ 370

1

Entering edit mode

Looks great!

Yes, labeling is a major issue, but there are different ways of tackling it:

Modify `cexRow` and change the dimensions of the heatmap

cexRow controls the size of the labels, as you probably know, whilst modifying the dimensions of the heatmap could work whereby you elongate the heatmap. For example, try the following:

pdf("MyHeatmap.pdf", width=5, height=11)
     par(mar=c(2,2,2,2), cex=1.0)
     heatmap.2(..., cexRow=0.6)
dev.off()

Only include certain genes in the labels

Here you can use a vector as the rownames and only include certain key genes in it. For example, the vector could be:

myKeyGenes <- c("", "", "TP53", "", "", "", "BRCA1", ..., "geneX")

In heatmap.2, you then specify this with labRow=myKeyGenes. The order of the vector has to match the order of your data-matrix that is used for clustering. You can then use a normal-sized value for cexRow, as most of the labels are blank spaces.

Use a color-vector and switch off labelling

Here, you provide a color vector instead of labels and set it with RowSideColors in heatmap.2. For example, you could shade genes of a certain pathway in one color, or transcripts that are non-coding RNAs.

...of course, you can also use a combination of all of these.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

I have the excel file generated from Cuffdiff output for genes with the following columns

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant

As per the excel file, sample_1 is Mutant and sample_2 is Wildtype. Log2(fold_change) is calculated as log2(sample_2/sample_1).

I thought it should be log2(final/initial), isn't it?

what is the difference between log2(Mutant/Wildtype) or log2(Wildtype/Mutant)?

ADD REPLY • link 7.2 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Hi friend, the difference is just in the interpretation.

If, for GeneX, Sample1's expression is 20 and Sample2's expression is 5, then:

log2(Sample1/Sample2) = 2

We can make the following statement: Sample1 has higher expression than Sample2 for GeneX

log2(Sample2/Sample1) = -2

We can make the following statement: Sample2 has lesser expression than Sample1 for GeneX

Both statements are implying the same thing. You can see, however, that the choice of nominator and denominator is important.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

I have a similar problem but I am not able to reorder my data as I have missing values in some columns, could you please take a look at my thread?

Thanks !

ADD REPLY • link 6.6 years ago by eggrandio ▴ 40

0

Entering edit mode

Done.

ADD REPLY • link 6.2 years ago by Kevin Blighe 88k

Modify cexRow and change the dimensions of the heatmap

Only include certain genes in the labels

Use a color-vector and switch off labelling

Modify `cexRow` and change the dimensions of the heatmap