Hi.
I want to perform a survival analysis using RNA-Seq data from several cancer subtypes. To preprocess the data I was following the tutorial described here Tutorial: Survival analysis of TCGA patients integrating gene expression (RNASeq) data. However, I have two questions related to it.
1- In the tutorial, the data is scaled using the formula z = [(value gene X in tumor Y)-(mean gene X in normal)]/(standard deviation X in normal)
. Here there are only 2 conditions, cancer vs normal samples. In my case, I have samples for 4 breast cancer subtypes and the normal samples. How should I proceed here? Can I scale the matrix based on the formula provided (ignoring the information about cancer subtypes)? My goal is to test whether there are significant differences in the expression of some genes in each of the subtypes analysed.
2- Just to be sure, the voom transformation must be applied to the matrix with all the genes (except those not expressed), or can be applied to a subset of genes of interest (i.e. differentially expressed genes)?
Thanks in advance.