Question

Visualizing Expression Data in Heatmap.3

0

Entering edit mode

7.5 years ago

CMosychuk ▴ 20

Question: When is it appropriate to use a z-score transform? And are there advantages to log2 fold change over fold change?

I'm a novice analyzing gene expression data produced by a Nanostring nCounter instrument. Nanostring returns counts of mRNA, where a count of 35.20 for ABCB1 equates to 35.20 normalized counts of mRNA for that gene in the sample. This isn't my data but it's a depiction of normalized counts as I'd be viewing them. For reference, these are control (healthy) samples.

          GSM2149003 GSM2149004 GSM2149005 GSM2149006
ABCB1        35.20      83.39      58.79      64.19
ABL1        217.04     174.77     181.38     253.35
ADA         196.51     183.64     114.71     155.78
AHR         555.80     495.02     728.40     754.07
AICDA        22.00      11.53       9.32      11.13
AIRE         35.20      21.29      17.92      21.40
APP         382.75     468.41     436.61     363.77
ARG1         36.66      46.13      25.81      18.83
ARG2         13.20      23.95      10.04       5.99
ARHGDIB    2846.45    2805.14    2941.54    3412.58

structure(c(35.2, 217.04, 196.51, 555.8, 22, 35.2, 382.75, 36.66, 
13.2, 2846.45, 83.39, 174.77, 183.64, 495.02, 11.53, 21.29, 468.41, 
46.13, 23.95, 2805.14, 58.79, 181.38, 114.71, 728.4, 9.32, 17.92, 
436.61, 25.81, 10.04, 2941.54, 64.19, 253.35, 155.78, 754.07, 
11.13, 21.4, 363.77, 18.83, 5.99, 3412.58), .Dim = c(10L, 4L), .Dimnames = list(
    c("ABCB1", "ABL1", "ADA", "AHR", "AICDA", "AIRE", "APP", 
    "ARG1", "ARG2", "ARHGDIB"), c("GSM2149003", "GSM2149004", 
    "GSM2149005", "GSM2149006")))

Source: Jangi S, Gandhi R, Cox LM, Li N et al. Alterations of the human gut microbiome in multiple sclerosis. Nat Commun 2016 Jun 28;7:12015. PMID: 27352007

I'm using the following code to plot the data:

distCor <- function(x) as.dist(1-cor(t(x)))

hclustAvg <- function(x) hclust(x, method="average")

heatmap.3(mat.num, trace="none", scale="row", zlim=c(-5,5), reorder=TRUE,
          distfun=distCor, hclustfun=hclustAvg, col=rev(cols), symbreak=TRUE, margins=c(8,4))

The way I see it I have a number of ways I can heatmap plot the data, perhaps I'm missing some:

1) converted to z-score by row

2) converted to fold change over some median "normal" expression level

3) converted to fold change over some median "normal" expression level, and converted to z-score by row

When is it appropriate to use a z-score transform? And are there advantages to log2 fold change over fold change?

R Nanostring • 2.7k views

ADD COMMENT • link updated 7.5 years ago by christophersugai • 0 • written 7.5 years ago by CMosychuk ▴ 20

score 0 · Answer 1 · 2017-05-13

Never used nanostring, can you even plot your fold changes (not log changes) with a heatmap? When I've tried on datasets, it either comes out that the data causes the heatmap to be all one color, or you have to have weird cutoffs and still may get a weird looking heatmap. Reason being that you'll have an 'average' but the spread is really huge, and so it'll all be the same color except a few of the highest and lowest numbers. Thus log2.

As for z score, https://stats.stackexchange.com/questions/36076/is-a-heat-map-of-gene-expression-more-informative-if-z-scores-are-used-instead-o