Hello,
This is probably a very simple question.
I am currently working on an expression matrix to replicate a "cytolytic score" in R.[1] The cytolytic score is defined as "geometric mean of GZMA and PRF1". My goal is to then investigate the Pearson's correlation of this score with other genes of interest in the matrix.
I understand the concept of a geometric mean, but wanted to see if this makes mathematical sense (and is a common transformation) in expression analysis.
For example, if we have 3 samples (S1, S2, S3) for the two genes:
S1 S2 S3
PRF1 1.1 2 0.5
GZMA 2 1 1
The transformation would be:
S1 S2 S3
CytScore sqrt(1.1*2) sqrt(2*1) sqrt(0.5*1)
Thus, by this transformation, you still have a a set of 3 values can be used in correlation calculations.
- Molecular and genetic properties of tumors associated with local immune cytolytic activity doi: 10.1016/j.cell.2014.12.033
Thank you Akhil, this makes sense. The the paper they uses Transcripts per Million (TPM) count from RNA-seq in calculating the score.
The data I'm looking at is a normalized Affy microarray experiment, but I can get the relative signal intensity as well. I am skeptical, however, of using this measurement for a microarray due to the statistical noise. I do like your z-scoring approach which I'm guessing is:
Thank you for your help.