Hi Biostars community,
I am currently doing survival and other analyses in RNAseq gene expression data using the ACC TCGA database.
How can i change the code for obtaining the z-score applying this formula?
z = [(value gene X in tumor Y)-(mean gene X in tumor)]/(standard deviation X in tumor)
since ACC data has no normal values, so no n_index. I solved voom transformation but I am stucked with this.
scal <- function(x,y){
mean_n <- rowMeans(y) # mean of normal
sd_n <- apply(y,1,sd) # SD of normal
# z score as (value - mean normal)/SD normal
res <- matrix(nrow=nrow(x), ncol=ncol(x))
colnames(res) <- colnames(x)
rownames(res) <- rownames(x)
for(i in 1:dim(x)[1]){
for(j in 1:dim(x)[2]){
res[i,j] <- (x[i,j]-mean_n[i])/sd_n[i]
}
}
return(res)
}
z_rna <- scal(rna_vm[,t_index],rna_vm[,n_index])
Can I just t(scale(t(rna_vm)))?
I guess Its not the solution...
Sorry for the naive question. Im kinda new in R and bioinformatics. Thank you so much, Alex
Thank you very much Kevin, your reply has been very helpful to me. I have been doing some research and I still don't know why one approach would be more favorable rather than the other one. Scaling by row shows me very low expression values, so I tried by global and I get a median z-score of my gene of interest closer to the median of my gene in other databases (with normal samples), so I will keep global as a better approach..
Alex
I think that we usually 'scale per gene' (i.e., the
t(scale(t(x)))
example) for visualisation purposes, e.g., heatmaps, or where we are doing univariate analysis, like survival analysis with gene expression.The global Z-score makes a lot more sense in other scenarios, obviously.
I agree. The problem is that scaling per gene extremely stratify the samples (after create an event vector) over the median expression gene of interest value, so almost the entire samples are "high expression". I'm not completely sure if global z-score is a bad practice for survival analysis with gene expression.