Question

Z-score difference between Z group means

0

Entering edit mode

2.2 years ago

schulpen_91 ▴ 30

Hello all, a question.

In a tutorial on differential expression analysis the following happened: Normalized data was converted to Z-scores and divided into two biological groups. A t-test was performed on the two groups (which is fine). Now the point of confusion. The means of the two groups were calculated separately, followed by "Z-difference=mean1 - mean2".

It was said this Z-difference signified biological difference and made the overall statistics more robust.

I understand that a Z-difference of -1 < Z-diff < 1 could indicate that two groups are closely related (e.g., low variation between groups). However, I can not find any online references about its specific use.

What is (in your opinion) the value of this Z-difference and how would it increase robustness?

This is the tutorial video in question. The Z-thingy is performed around the beginning. https://www.youtube.com/watch?v=JwiFoUWQUIg&list=PLDN1R5gNkbQw4JM3DOn9TzKAcpQz-Z9ym&index=4

Statistics differential-expression • 1.3k views

ADD COMMENT • link updated 2.2 years ago by LChart 4.9k • written 2.2 years ago by schulpen_91 ▴ 30

0

Entering edit mode

Because I don't have access to raw data files DESeq2 and etc. did not seem too effective. I know I know, working in Excel is frowned upon here and it's not my preference either but hey :]

ADD REPLY • link updated 2.2 years ago by ATpoint 88k • written 2.2 years ago by schulpen_91 ▴ 30

score 1 · Answer 1 · 2023-03-07

The approach you describe does nothing except change a scale factor. Because

u = mean(rawdata)
s = sd(rawdata) 
Z = (rawdata - u)/s

Then, clearly

Z1 := mean(Z[group1]) = mean((rawdata[group1] - u)/s) = (mean(rawdata[group1]) - u)/s := R1/s - u/s

and

Z1 - Z2 = (R1/s - u/s) - (R2/s - u/s) = (R1 - R2)/s

Therefore all you've done is scale the between-group difference by the standard deviation of the sample. You haven't made anything "more robust" as group test statistics -- including the T-test -- are invariant under scaling and shifting the underlying data.*

However, changes scaled in this way have units "as a proportion of sample deviation", and can be useful to compare differences between features that originally are on disparate scales (such as height, weight, gene expression, pH levels, etc.) and are otherwise incommensurate.

* With -regularized- statistics as a caveat