I'm working on doing some analysis of drug response in cell lines, specifically the genomics of drug sensitivity in cancer (GDSC). In downloading from the source (https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html) the expression data is described as "RMA normalised basal expression profiles for all the cell-lines."
Most of my work so far has been on RNA-Seq data where the data is often given in terms of transcript counts (e.g. Transcripts per Million (TPM)). As I understanding it microarrays don't give counts like this, per se, just 'expression levels' derived from probe intensity.
My question is:
1) What units, if any, does RMA output?
2) Are there standard RMA pre-processing steps to use for machine learning (e.g. centering + scaling)?
3) Does taking the log2(RMA_Value+1) make sense?
Cheers!