Entering edit mode
4.2 years ago
asalimih
▴
60
Hello,
I am using TCGA RNA-Seq data for a neural net method. I have splitted the data to train and test set. how do i normalize count data of train and test set without data leakage using limma?
My Idea is I might be able to normalize the train set using limma+voom method first and then use the parameters calculated by voom on train set(E
and weights
) to normalize the test set. however I couldn't realize how?
any help would be greatly appreciated.
I'd normalize them separately. Simply make two matrices, one test and one train, and run the normalization on each of them.
thank you. after I searched more it seems that using
E
andweights
is not correct for this purpose. those parameters are forlmfit
and limma itself. Gordon Smyth recommended this:Yes, just run the default normalization as in the manual, so calcNormFactors and then cpm() using the log-transformation. Alternatively something like vst() from DESeq2 might be of interest here.