Question: When is it appropriate to use a z-score transform? And are there advantages to log2 fold change over fold change?
I'm a novice analyzing gene expression data produced by a Nanostring nCounter instrument. Nanostring returns counts of mRNA, where a count of 35.20 for ABCB1 equates to 35.20 normalized counts of mRNA for that gene in the sample. This isn't my data but it's a depiction of normalized counts as I'd be viewing them. For reference, these are control (healthy) samples.
GSM2149003 GSM2149004 GSM2149005 GSM2149006
ABCB1 35.20 83.39 58.79 64.19
ABL1 217.04 174.77 181.38 253.35
ADA 196.51 183.64 114.71 155.78
AHR 555.80 495.02 728.40 754.07
AICDA 22.00 11.53 9.32 11.13
AIRE 35.20 21.29 17.92 21.40
APP 382.75 468.41 436.61 363.77
ARG1 36.66 46.13 25.81 18.83
ARG2 13.20 23.95 10.04 5.99
ARHGDIB 2846.45 2805.14 2941.54 3412.58
structure(c(35.2, 217.04, 196.51, 555.8, 22, 35.2, 382.75, 36.66,
13.2, 2846.45, 83.39, 174.77, 183.64, 495.02, 11.53, 21.29, 468.41,
46.13, 23.95, 2805.14, 58.79, 181.38, 114.71, 728.4, 9.32, 17.92,
436.61, 25.81, 10.04, 2941.54, 64.19, 253.35, 155.78, 754.07,
11.13, 21.4, 363.77, 18.83, 5.99, 3412.58), .Dim = c(10L, 4L), .Dimnames = list(
c("ABCB1", "ABL1", "ADA", "AHR", "AICDA", "AIRE", "APP",
"ARG1", "ARG2", "ARHGDIB"), c("GSM2149003", "GSM2149004",
"GSM2149005", "GSM2149006")))
Source: Jangi S, Gandhi R, Cox LM, Li N et al. Alterations of the human gut microbiome in multiple sclerosis. Nat Commun 2016 Jun 28;7:12015. PMID: 27352007
I'm using the following code to plot the data:
distCor <- function(x) as.dist(1-cor(t(x)))
hclustAvg <- function(x) hclust(x, method="average")
heatmap.3(mat.num, trace="none", scale="row", zlim=c(-5,5), reorder=TRUE,
distfun=distCor, hclustfun=hclustAvg, col=rev(cols), symbreak=TRUE, margins=c(8,4))
The way I see it I have a number of ways I can heatmap plot the data, perhaps I'm missing some:
1) converted to z-score by row
2) converted to fold change over some median "normal" expression level
3) converted to fold change over some median "normal" expression level, and converted to z-score by row
When is it appropriate to use a z-score transform? And are there advantages to log2 fold change over fold change?