Question

Transform log2 fold changes in z-scores

1

Entering edit mode

6.0 years ago

gg ▴ 10

Hello everyone!

I have a dataset consisting of CRISPR gRNA read counts coming from two different samples (something very similar to a RNA-seq experiment output).

I have transformed the read counts in log2 values and computed fold change between treated and non-treated sample. The distribution of the data is normal, but the mean is not = 0. I am plotting these data in a volcano plot, and the plot doesn't look right as I am plotting depleted vs enriched gRNAs but they do not correspond to negative vs positive values. So I thought to transform the values in z-scores. I wonder if this is correct. I have seen it is common to do that for microarray data, but I am not completely sure this applies to my data.

Many thanks for your help!

Giovanna

See below the plot:

RNA-Seq • 9.2k views

ADD COMMENT • link updated 6.0 years ago by The ▴ 180 • written 6.0 years ago by gg ▴ 10

0

Entering edit mode

check scale() in R

ADD REPLY • link 6.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Indeed that's what I did, does not the function scale() transform the data in z-scores? I still wonder if this is statistically correct...

ADD REPLY • link 6.0 years ago by gg ▴ 10

0

Entering edit mode

could you add the volcano plot ?

ADD REPLY • link 6.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

ADD REPLY • link updated 6.0 years ago by Ram 45k • written 6.0 years ago by gg ▴ 10

0

Entering edit mode

It is unclear what you want gg. Why do you want to make z-scores of log2 fold changes? You also have p-values, how did you calculate them? What is wrong with the volcanoplot using log2 FC instead of z-scores? What do you want to do with the z-scores?

ADD REPLY • link 6.0 years ago by Benn 8.4k

0

Entering edit mode

Hello, thanks both again for helping me. When I use FC values the plot looks like this (see below), which I find much more difficult to interpret, especially when I need to plot vertical lines to indicate FC-based thresholds. My aim is to compare two different approaches of analysis. In the specific, setting thresholds according to negative control distribution (the vertical lines of above) and using p-values calculated by rank product analysis (the 0.05 horizontal line). Hope it is clearer now!

sito per caricare foto online

ADD REPLY • link 6.0 years ago by gg ▴ 10

1

Entering edit mode

It is not clear how you calculate p-values. Why use Rank method, and where is the FDR correction? Neither is it clear how you have calculated log2 FC, they seem weird if I see your volcanoplot.

ADD REPLY • link 6.0 years ago by Benn 8.4k

0

Entering edit mode

At first, the read counts were transformed in log2 values. The fold change decrease between treated and untreated control samples was calculated as described in the Equation below: gRNA C_score=[log_2⁡(gRNA abundance treated sample)-log_2⁡(gRNA abundance non treated sample) ] As each gene was targeted by 6 different gRNAs, the mean gRNA abundance of each gene was calculated using the Equation below: 〖gene C〗_score= □((∑▒〖gRNA C〗_score )/n gRNA) Finally, Rank product analysis was performed, using the FC of each gRNA targeting the same gene as a replicate.

ADD REPLY • link 6.0 years ago by gg ▴ 10

0

Entering edit mode

Okay thanks for explanation, I am not sure why you would use this protocol instead of edgeR for example. Try edgeR and see if you still have this weird shift of log2 FC towards the -1.

ADD REPLY • link 6.0 years ago by Benn 8.4k

0

Entering edit mode

why do you consider your result as difficult to interpret? it looks pretty good, it looks like you will not find significant differences between your 2 conditions but it looks like it has been well analyzed. The only concern is that I recommend you to plot the -log10 of padjusted value instead of pvalue to get the real significant expression values.

ADD REPLY • link 6.0 years ago by Buffo ★ 2.4k

score 0 · Answer 1 · 2019-08-22

What I believe there is not sufficient 'scatter' in the plot. In your case the p-value is usually better(lower) with increase in absolute(fold change) almost monotonically . That might have to do something with the calculation of p-value in Rank Product analysis( do they still calculate it by random permutation, or introduced any exact method?) , or because of small number of samples or use of technical replicates as samples(pseudo replication).

I would suggest check some papers which used rank product and how the volcano plot looks like in those examples