Question

How to merge zscore value of replicate samples in LINCS data

3

Entering edit mode

10.4 years ago

liuyang ▴ 30

I'm now working with the LINCS level 4 data. It is the gene expression profiles of human genes when cells are exposed to a variety of perturbing agents. For a single perturbation, there exists many replicate samples. And the LINCS level 4 data provides modified zscore for all the replicates. So if I want to merge the zscores of the replicates samples, is it reasonable to simply average the zscores? But I doubt the distribution will change with this method?

LINCS • 5.8k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by liuyang ▴ 30

0

Entering edit mode

Did you find the answer?

I have a same problem!

ADD REPLY • link 9.4 years ago by hajramezanali • 0

0

Entering edit mode

I have a similar problem but I am trying to replicate the conversion of Level 3 data to Level 4. Any idea how that can be done? I have used standard z-score, robust z-score (using median) but I can't seem to replicate the Level 4 gene expression values.

ADD REPLY • link 3.7 years ago by Safi • 0

Ram · Answer 1 · 2014-11-08

2

Entering edit mode

10.4 years ago

Jean-Karim Heriche 27k

I don't know what LINCS data are but it seems your'e looking for Stouffer's method.

ADD COMMENT • link 10.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

The way I interpret the question (and I also don't know LINCS data) is that gene expression levels are normalized to z-scores and the OP wants to summarize these gene expressions in each experiment by averaging the z-scores. Stouffer's method instead would combine p-values, which could be extracted from z-scores, to get an overall level of significance of the underlying tests.

E.g. if two genes have z = 2 and 4, the summarized expression would be 3. Instead Stouffer's method would give p = 0.999989 corresponding to z = 4.2.

If my interpretation is correct I don't see it wrong to average z-scores, or taking the median if one is worried about outliers.

ADD REPLY • link 10.4 years ago by dariober 15k

0

Entering edit mode

Fisher's method combines the p-values directly whereas Stouffer's method (or Z-transfom) gets at a combined p-value using the average of the z-scores (which can be weighted). The two methods are compared here and here.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes, but the point I wanted to make is that Stouffer's (or Fisher's) method combine the significance from different independent tests, expressed as p-value or z-score, in order to obtain an overall probability for the null hypothesis. My understanding (which my be wrong) of the question is that liuyang wants to combine expression values not probabilities or significance. So if two genes are expressed with z=2 and z=4 the combined expression is somewhere between 2 and 4 (arithmetic mean would say z=3). Combining these expression with Stouffer's method would give an overall expression >4.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by dariober 15k

0

Entering edit mode

You're right. I guess I jumped to conclusions because I had recently been looking into combining p-values.

ADD REPLY • link 10.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Sure now I'm not concerned about the p-value, I just need the combined z-score for subsequent calculations. Finally I take the Stouffer's method. The weight is the average value of spearman coefficient.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by liuyang ▴ 30

0

Entering edit mode

Thanks for your suggestion, but the questions is what weight should I give to the different samples. I think arithmetic mean is ok, but how about arithmetic mean weighted by Spearman's rank correlation coefficient?

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by liuyang ▴ 30

0

Entering edit mode

Quoting this paper:

How are these weights to be chosen? Ideally each study is weighted proportional to the inverse of its error variance, that is, by the reciprocal of its squared standard error. For studies that use t-tests, for example, this is done by weighting each study by its d.f.: *w_i = ν_i. More generally, the weights should be the inverse of the squared standard error of the effect size estimate for each study.

So I guess using the correlation coefficient is OK.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Jean-Karim Heriche 27k