Hello everyone!
I have a dataset consisting of CRISPR gRNA read counts coming from two different samples (something very similar to a RNA-seq experiment output).
I have transformed the read counts in log2 values and computed fold change between treated and non-treated sample. The distribution of the data is normal, but the mean is not = 0. I am plotting these data in a volcano plot, and the plot doesn't look right as I am plotting depleted vs enriched gRNAs but they do not correspond to negative vs positive values. So I thought to transform the values in z-scores. I wonder if this is correct. I have seen it is common to do that for microarray data, but I am not completely sure this applies to my data.
Many thanks for your help!
Giovanna
See below the plot:
check
scale()
in RIndeed that's what I did, does not the function scale() transform the data in z-scores? I still wonder if this is statistically correct...
could you add the volcano plot ?
It is unclear what you want gg. Why do you want to make z-scores of log2 fold changes? You also have p-values, how did you calculate them? What is wrong with the volcanoplot using log2 FC instead of z-scores? What do you want to do with the z-scores?
Hello, thanks both again for helping me. When I use FC values the plot looks like this (see below), which I find much more difficult to interpret, especially when I need to plot vertical lines to indicate FC-based thresholds. My aim is to compare two different approaches of analysis. In the specific, setting thresholds according to negative control distribution (the vertical lines of above) and using p-values calculated by rank product analysis (the 0.05 horizontal line). Hope it is clearer now!
sito per caricare foto online
It is not clear how you calculate p-values. Why use Rank method, and where is the FDR correction? Neither is it clear how you have calculated log2 FC, they seem weird if I see your volcanoplot.
At first, the read counts were transformed in log2 values. The fold change decrease between treated and untreated control samples was calculated as described in the Equation below: gRNA C_score=[log_2(gRNA abundance treated sample)-log_2(gRNA abundance non treated sample) ] As each gene was targeted by 6 different gRNAs, the mean gRNA abundance of each gene was calculated using the Equation below: 〖gene C〗_score= □((∑▒〖gRNA C〗_score )/n gRNA) Finally, Rank product analysis was performed, using the FC of each gRNA targeting the same gene as a replicate.
Okay thanks for explanation, I am not sure why you would use this protocol instead of edgeR for example. Try edgeR and see if you still have this weird shift of log2 FC towards the -1.
why do you consider your result as difficult to interpret? it looks pretty good, it looks like you will not find significant differences between your 2 conditions but it looks like it has been well analyzed. The only concern is that I recommend you to plot the -log10 of padjusted value instead of pvalue to get the real significant expression values.