How to calculate moderated z-score (MODZ)
1
4
Entering edit mode
6.8 years ago
Gema Sanz ▴ 90

Hi,

I'm trying to reproduce the calculation of moderated z-score (MODZ) described in PMID: 29195078 for some microarray data. Calculation is described as follows:

Replicate-consensus signatures (MODZ) L1000 experiments are typically done in 3 or more biological replicates. We derive a consensus replicate signature by applying the moderated z-score (MODZ) procedure as follows. First, a pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates, normalized such that all weights sum to 1. Finally, the consensus signature is given by the linear combination of the replicate signatures with the coefficients set to the weights. This procedure serves to mitigate the effects of uncorrelated or outlier replicates, and can be thought of as a ‘de-noised’ representation of the given experiment’s transcriptional consequences.

I already calculated robust z-score for each gene (n=900) at every sample (n=6, 3 controls, 3 treatments) and I used the transposed matrix (genes as columns, samples as rows) to calculate pairwise spearman correlation using:

cor<- cor(tm, use="pairwise.complete.obs","spearman")

but I'm sure I'm missing something because I get error

Error in cor(tm, use = "pairwise.complete.obs", "spearman") : 
  'y' must be numeric

I don't know how to define x or y (I'm quite newby with R), but as I understand, I have to correlate control 1 to control 2, control 2 to control 3 and control 1 to control 3 individually and same for treatments and then compute the weights.

Any ideas about the code or how to proceed will be very much welcome!!!

Gema

z-score modz • 6.6k views
ADD COMMENT
2
Entering edit mode

You could try:

cor <- cor(data.matrix(tm), use="pairwise.complete.obs", "spearman")

Your tm object should only contain numerical values. Most likely it is currently a data-frame object, which is not specifically numerical.

ADD REPLY
1
Entering edit mode

Wait, cross-post: C: correlation between genes

ADD REPLY
1
Entering edit mode

If you want pairwise sample correlations, then your samples should be columns, with genes as rows

ADD REPLY
0
Entering edit mode

yes it was me, but I wanted to post the whole thing to make things clear about what I need to do

ADD REPLY
0
Entering edit mode

I just get the same error with that code. The only way I found to skip that error and make the code work is to set y as tm, but I don't know the meaning of that

y<- tm cor<- cor(tm, y, use="pairwise.complete.obs","spearman")

ADD REPLY
0
Entering edit mode

Can you display a small version of tm (like first 10 genes)?

ADD REPLY
2
Entering edit mode

There is no issue setting y as tm. It just means that it will correlate tm to tm.

If you want gene-to-gene correlations, then:

cor(tm, tm, use="pairwise.complete.obs", "spearman")

If you want sample-to-sample correlations, then:

cor(t(tm), t(tm), use="pairwise.complete.obs", "spearman")
ADD REPLY
0
Entering edit mode

Sorry, I reached the limit of coments per 6h... this is an example matrix of my data, I have 6 different samples and 928 genes, it is already trasposed

https://imgur.com/a/iZtsm

what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that.

ADD REPLY
0
Entering edit mode

Actually, this is a bit weird because the weighted average of zscores isn´t a zscore.

ADD REPLY
1
Entering edit mode

Why did you post this as an answer?

ADD REPLY
1
Entering edit mode

I have moved it to a comment. If you provide an answer, please answer the question that was originally asked.

ADD REPLY
0
Entering edit mode

yes sorry it was just a thought I did not answer the question.

ADD REPLY
0
Entering edit mode

No problem - no need to worry at this point.

ADD REPLY
3
Entering edit mode
6.8 years ago

As just mentioned on your other post, you need method="spearman", since otherwise R is seeing the unnamed argument as the thing you should be correlating tm with. Also, you want t(tm), as Kevin mentioned.

ADD COMMENT
1
Entering edit mode

Good spot on "spearman" Devon.

cor(tm, use="pairwise.complete.obs", method="spearman")

or

cor(t(tm), use="pairwise.complete.obs", method="spearman")
ADD REPLY
0
Entering edit mode

Thanks! that solved the "y" problem. As I wrote above what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that, I guess I should do a correlation using x for one sample and y for another and do it with all possible comparisons... ?

pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates

From the text I understand that I need one correlation value per gene and per sample and then combine those from replicates.

I have no idea how to do that, they have a script in MATLAB to do it, but I don't know how to extrapolate into R. This is the MATLAB code:

function modz_ds = level4_to_level5(zsrep_file, col_meta_file, landmark_file, group_var)
% LEVEL4_TO_LEVEL5 Compute Moderated Z-scores (ModZ) from replicate signatures

% zsrep_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF_n67x22268.gctx';
% col_meta_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF.map';
% landmark_file = '/Users/narayan/workspace/cmapM/data_processing/resources/L1000_EPSILON.R2.chip';

% z-scores for all replicate signatures
zsrep = parse_gctx(zsrep_file);
% sample annotations
col_meta = parse_tbl(col_meta_file, 'outfmt', 'record');
% landmark annotations
chip = parse_tbl(landmark_file, 'outfmt', 'record');

%% Generate moderated z-score (ModZ) signatures

% Exclude large outlier zscores
zsrep.mat = clip(zsrep.mat, -10, 10);

% Landmark features
pr_id_lm = {chip(strcmp('LM', {chip.pr_type})).pr_id}';
[~, lm_ridx] = intersect(zsrep.rid, pr_id_lm);

% column ids
cid = {col_meta.cid}';

% Group samples
[rep_gp, rep_idx] = getcls({col_meta.(group_var)}');

num_gp = length(rep_gp);
[num_row, num_col] = size(zsrep.mat);
modz_mat = nan(num_row, num_gp);
for ii=1:num_gp
    this_gp = rep_idx == ii;
    this_zs = zsrep.mat(:, this_gp);    
    fprintf(1, '%d/%d %s Computing ModZS %d replicates\n', ii, num_gp, rep_gp{ii}, nnz(this_gp));
    % determine weights based on replicate correlations in landmark space
    [modz_mat(:, ii), wt, cc] = modzs(this_zs, lm_ridx);          
    fprintf(1, 'Replicate correlations\n');
    disp(cc);
    fprintf(1, 'Replicate weights: ');
    disp(wt');
end
% Annotate ModZ matrix
modz_ds = mkgctstruct(modz_mat, 'rid', zsrep.rid, 'cid', rep_gp);
[~, uidx] = unique(rep_idx, 'stable');
modz_meta = keepfield(col_meta(uidx), {'rna_well', 'pert_id',...
                                       'pert_iname', 'pert_type',...
                                       'cell_id','pert_idose',...
                                       'pert_itime'});
modz_ds = annotate_ds(modz_ds, modz_meta);
%modz_ds = annotate_ds(modz_ds, row_meta, 'dim', 'row', 'keyfield', 'pr_id');
end
ADD REPLY
1
Entering edit mode

I like how obtuse their description is. The actual matlab code is here, which in R is:

modzs <- function(m, clipLowWt=TRUE, lowThreshWT=0.1, clipLowCC=TRUE, lowThreshCC=0, metric="avg") {
    cc = cor(m, method="spearman") # pair-wise correlations
    if(clipLowCC) { # Trim low values
        cc[cc<lowThreshCC] = lowThreshCC
    }

    wt = 0.5 * colSums(cc) # Per-sample values
    if(clipLowWt) { # Trim low values
        wt[wt<lowThreshWT] = lowThreshWT
    }

    # Normalize the weights
    if(metric=="avg") {
        sumWT = sum(abs(wt))
    } else {
        sumWT = sqrt(sum(wt^2))
    }
    normWT = wt / sumWT  # Normalized weights

    # Return the scaled input values
    return(m * normWT)
}

I've set the defaults of modzs() to match the ones used in matlab. I've not tried this on real data, so I don't have a good sense how reasonable it is.

Note that the input matrix, m, has genes as rows and samples as columns.

ADD REPLY
1
Entering edit mode

I think it is working! The function doesn't give errors and I got something that I think is the expected output

https://ibb.co/eLtkSc

ADD REPLY
0
Entering edit mode

wow, thanks A LOT!!! this is a start point, I will play around with it! :)

ADD REPLY
0
Entering edit mode

Do you code in MatLab Devon? - I don't!

ADD REPLY
1
Entering edit mode

Not since college, which was quite a while back :P

ADD REPLY

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6