limma voom normalization of train and test set

0

Entering edit mode

5.1 years ago

asalimih ▴ 60

Hello,
I am using TCGA RNA-Seq data for a neural net method. I have splitted the data to train and test set. how do i normalize count data of train and test set without data leakage using limma?
My Idea is I might be able to normalize the train set using limma+voom method first and then use the parameters calculated by voom on train set(E and weights) to normalize the test set. however I couldn't realize how?

any help would be greatly appreciated.

limma machine learning RNA-Seq data leakage • 1.8k views

ADD COMMENT • link 5.1 years ago by asalimih ▴ 60

0

Entering edit mode

I'd normalize them separately. Simply make two matrices, one test and one train, and run the normalization on each of them.

ADD REPLY • link 5.1 years ago by ATpoint 89k

0

Entering edit mode

thank you. after I searched more it seems that using E and weights is not correct for this purpose. those parameters are for lmfit and limma itself. Gordon Smyth recommended this:

I always recommend cpm(counts, log=TRUE, prior.count=3) for the purpose of other down-stream analyses, because the voom quantities cannot be summarized in single combined quantity. link

ADD REPLY • link 5.1 years ago by asalimih ▴ 60

0

Entering edit mode

Yes, just run the default normalization as in the manual, so calcNormFactors and then cpm() using the log-transformation. Alternatively something like vst() from DESeq2 might be of interest here.

ADD REPLY • link 5.1 years ago by ATpoint 89k

Login before adding your answer.